towards linked vital registration data for reconstituting families and creating longitudinal health...

10

Click here to load reader

Upload: christophe-debruyne

Post on 01-Jul-2015

42 views

Category:

Technology


1 download

DESCRIPTION

Presentation by Christophe Debruyne to the 6th International Workshop on Knowledge representation for Health Care (KR4HC 2014), Vienna, Austria. July 21, 2014. Authors: Oya Beyan, Ciara Breathnach, Sandra Collins, Christophe Debruyne, Stefan Decker, Dolores Grant, Rebecca Grant, and Brian Gurrin. Presentation also available at http://www.slideshare.net/dri_ireland/linked-vitalregistrationdatalongitudinalhealthhistories Paper available at: https://www.researchgate.net/publication/264200941_Towards_Linked_Vital_Registration_Data_for_Reconstituting_Families_and_Creating_Longitudinal_Health_Histories?ev=prf_pub ABSTRACT: The Irish Record Linkage 1864-1913 project aims to create a knowledge base containing historical birth-, marriage-and death records encoded into RDF to reconstitute families and create longitudinal health histories. The goal is to interlink the different persons across these records as well as with supplementary datasets that provide additional context. With the help of knowledge engineers who will create the ontologies and set up the platform and the digital archivist who will curate, ingest and maintain the RDF, the historians will be able to analyse reconstructed "virtual" families of Dublin in the 19th and early 20th centuries, allowing them to address questions about the accuracy of officially reported maternal mortality and infant mortality rates. In the longer term, this plat-form will allow researchers to investigate how official historical datasets can contribute to modern-day epidemiological planning.

TRANSCRIPT

Page 1: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Towards Linked Vital Registration Data for

Reconstituting Families and Creating

Longitudinal Health HistoriesLongitudinal Health Histories

Oya Beyan, Ciara Breathnach, Sandra Collins, Christophe Debruyne, Stefan Decker, Dolores Grant,

Rebecca Grant, and Brian Gurrin

21st of July 2014 – KR4HC Workshop – Vienna, Austria21st of July 2014 – KR4HC Workshop – Vienna, Austria

Page 2: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Irish Record Linkage, 1864-1913

• Developing a platform applying semantic technologies to historical birth-, death and technologies to historical birth-, death and marriage certificates.

• Answering questions such as: “How accurate are historic maternal mortality rates (MMR) and infant mortality rates (IMR) for Dublin?”

• Team consists of researchers (historians), digital archivists, and knowledge engineers.

21/07/2014 2

Page 3: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Data: General Office Records

• Vital registration data– Birth-certificates– Birth-certificates

– Death-certificates

– Marriage records

• Digitised TIFF images of hardcopy indexes and registers.

• 2 TB of data• 2 TB of data

• Database describing the digitised records allowing searches on some fields.

21/07/2014 3

©General Records Office of Ireland 2014

Page 4: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Challenges

• Certified causes of death that can be attributed to maternal death– Within 42 days after labour – before (1864) it was 12– Within 42 days after labour – before (1864) it was 12

– Septicemia (blood poisoning), Fever, …

– “Corresponding” birth certificate?

• Death certificates with no corresponding birth certificate

• “Gaps” in sibship interval, even though no birth- or death certificates can be found.

• The terminology used pre-1900. E.g., “debile” to denote • The terminology used pre-1900. E.g., “debile” to denote weak or a failure to thrive.

• Capturing the socio-economical status of the families via, for instance, the professions, ranks of fathers.

21/07/2014 4

Page 5: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Conceptual ArchitectureDigital Archivist

SPARQL endpoint /

Linked Data Server

Updates

GRO records

as RDF

LinksLinker UpdaterRepository

Triple-

store

Linked Data Server

Analytics

Researcher

21/07/2014 5

DATA ANALYTICSPRESERVATION

Links to external datasets: e.g., Logainm – a database of Irish historical and

contemporary place names to provide additional context.

Page 6: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Development of 2 ontologies

Triplestore 2 Data Analysis

CO

NC

ER

NS

SE

PAR

AT

ION

OF

CO

NC

ER

NS

Obviously, due to

the sensitive

nature of the

data, data

protection is key.

21/07/2014 6

GRO Triplestore

Transformation from one model to another

• SPIN – SPARQL Inference

• SWRL / RuleML

• SPARQL Construct

• …

SE

PAR

AT

ION protection is key.

Page 7: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Development of 2 ontologies

• 2 ontologies were developed – separation of concerns

• First ontology for describing the contents of records– OWL 2 shallow, “flat ontology”

• Second ontology for data analysis– OWL 2 + rules

– Rules to capture background and domain knowledge– Rules to capture background and domain knowledge

– Developed by having the historians formulate competency questions (Grüninger and Fox)

– Captured graphically using Object Role Modelling

21/07/2014 7

Page 8: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Graphical Representation in ORM

21/07/2014 8

Page 9: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

### Prefixes ommitted …

irl:Record a owl:Class ;

rdfs:label "Record" ; .

irl:Certificate a owl:Class ;

rdfs:label "Certificate" ;

rdfs:subClassOf irl:Record; . rdfs:subClassOf irl:Record; .

irl:BirthRecord a owl:Class ;

rdfs:label "Birth Record" ;

rdfs:subClassOf irl:Certificate ; .

irl:DeathRecord a owl:Class ;

rdfs:label "Death Record" ;

rdfs:subClassOf irl:Certificate ; . irl:MarriageRecord a owl:Class ;

rdfs:label "Marriage Record" ; rdfs:label "Marriage Record" ;

rdfs:subClassOf irl:Record ; .

irl:Return a owl:Class ;

rdfs:label "Return" ; .

21/07/2014 9

Page 10: Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

Conclusions

• Presented the problem and highlighted the challengeschallenges

• Developed two ontologies

– Encoding contents of digitized GRO records for long-term digital preservation ���� DRI

– Data analytics to answer the researchers’ question – in this case a historianquestion – in this case a historian

• Data exploration and annotation of the records started on a subset of the dataset

21/07/2014 10