reusing legacy data: irish historic vital registration data, 1864-1913
DESCRIPTION
Presentation given by Dolores Grant at the European Society of Historical Demography conference, Alghero, Sardinia, 26 September 2014TRANSCRIPT
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Dolores GrantDr Ciara Breathnach, Dr Sandra Collins, Rebecca GrantIrish Record Linkage 1864-1913
Irish Record Linkage project 1864-1913
Irish Record Linkage is an Irish Research Council funded project running until December 2015
To construct a Knowledge Platform by applying semantic technologies to vital-registration data generously shared by the Office of the Registrar General
To address research queries around infant and maternal mortality rates and patterns in Dublin
Irish Record Linkage project 1864-1913
Collaboration between the Digital Repository of Ireland at the Royal Irish Academy, the University of Limerick and Insight@NUI Galway
Principal Investigators: Dr Ciara Breathnach (UL), Dr Sandra Collins (DRI), Dr Stefan Decker (Insight)
Project Team: Dr Brian Gurrin (UL), Dr Christophe Debruyne (Insight/DRI), Dr Oya Beyan (Insight), Rebecca Grant (DRI), Dolores Grant (DRI)
University of Limerick
Its mission is to promote and advance learning and knowledge through teaching, research and scholarship in an environment which encourages innovation and upholds the principles of free enquiry and expression.
The Faculty of Arts, Humanities and Social Sciences prides itself on the quality of its teaching and its commitment to research and places a strong emphasis on the role of debate and discussion in the development of knowledge and analytical skills.
The Digital Repository of IrelandBased in the Royal Irish Academy (Ireland's Academy for the Sciences and Humanities)
DRI is a trusted digital repository for the Humanities and Social Sciences data
Linking and preserving the rich data held by Irish institutions, providing a central internet access point and multimedia tools
Focal point for the development of national guidelines and policy for digital preservation and access
INSIGHT@NUI Galway
Insight brings together leading Irish academics from 5 of Ireland'€™s leading research centres (DERI, CLARITY, CLIQUE, 4C, TRIL), in key areas of priority research including:
The Semantic Web,Sensors and the Sensor Web,Social network analysis,Decision Support and Optimization, andConnected Health.
1845: Registration of marriages act was introduced to gather official statistics of marriages of the established Church of Ireland
1864: the first year Births, Deaths and Marriages (including Catholic Marriages) were registered following the establishment of a complete Irish civil registration system in 1863
Ireland 1864-1912: 2.9 million birth records4.9 million death records3.18 million marriages
Dublin 1864-1912: 609,720 birth records537,635 death records330,605 marriage records (1845-1913)
Irish Historic Vital Registration Data
The Linked Data Concept
A method of publishing structured data on the Web, allowing it to be connected and enriched, and facilitating linking between related resources.
A key principle of Linked Data is that HTTP URIs are used to name the semantic elements of the dataset
Linked Data standards such as RDF allows semantic definitions to be applied to information, using statements called ‘triples’ in the form subject, predicate, object.
The Linked Data Concept
This example describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.
Vital registration data: birth, death, marriage records for Dublin
TIFF images of pre-digitised indexes and registers of birth, death and
marriage
General Register Office database for these records
General Register Office Data
Marriage Records
Register TIFF Index TIFF System 1845-1901 System 1902-c.1912
Registrar’s District Registration District District District
Marriage solemnised at
Parish
Union
County County County
Province Province
Number in register Entry number
When married Year of event Year of event , Date of marriage
When registered Returns year Returns year
Returns quarter Returns quarter
Name and surname Name Forename, Surname Forename, Surname
Partner’s surname
Age
Sex
Condition
Rank or profession
Residence at the time of marriage
Father’s name and surname
Rank or profession of father
Celebrant
Witnesses
Signature of Registrar
Signature of Superintendant Registrar and date
Stamp Number Stamp number Stamp number
Volume number Returns volume number Returns volume number
Page number Page number Returns page number Returns Page number
Stamped number Page ID Page ID
2nd Stamped number
Index entry number Index entry number
Index page number
Birth Records
Register TIFF Index TIFF System Pre 1900 System Post 1900
Superintendent Registrar’s District
Registrar’s District Registration district District District
Union
County County County
Province Province
Number in register Entry number
Date & place of birth Year of event Date of birth, year of event
Name (if any) Name Forename, Surname Forename, Surname
Sex Sex
Name, surname & dwelling place of father
Name & surname & maiden surname of mother
Mother’s maiden name
Rank or profession of father
Signature, qualification, and residence of informant
When Registered Returns year Returns year
Returns quarter Returns quarter
Signature of Registrar
Name & surname & maiden surname of mother
Rank or profession of father
Signature, qualification, and residence of informant
Signature of Registrar
Signature of Superintendant Registrar and date
Baptismal name if added after registration of birth and date
Stamp Number Stamp number Stamp number
Volume number Returns volume number Returns volume number
Page number Page number Returns page number Returns page number
Stamped number Page ID
2nd Stamped number
Index entry number
Index page number
Death Records
Register TIFF Index TIFF System
Superintendent Registrar’s District
Registrar’s District Registration District District
District
Union
County County
Province
Number in register
Date and place of death Year of event
Name and surname Name Forename, Surname
Sex
Condition
Age last birthday Age Age at death
Rank, profession or occupation
Certified cause of death and duration of illness
Signature, qualification and residence of informant
When registered Returns year
Returns quarter
Signature of Registrar
Signature of Superintendant Registrar and date
Stamp number Stamp number
Volume number Returns volume number
Page number Page number Returns page number
Stamped number Page ID
2nd Stamped number
Index entry number
Index page number
Research Questions
Identifying the record fields that are necessary to maintain the archival authenticity of the records and answer the research questions:
•How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports?
•Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists?
•How did various socio-economic conditions affect maternal and infant mortality rates?
Competency questions to construct the Ontology
ID Competency Question
C01 Women died within 42 days after giving birth (the date of birth counted as day 1 and day 42 is included)
C02 Women died within 42 days after giving birth AND in their death certificate ‘complication 1’ is mentioned.
C03 Women died within 42 days after giving birth AND in their death certificate ‘complication 2’ is mentioned.
C04 Women having official maternal death reports including “XXXX’
C05 Women having official maternal death reports including “cause 1”
C06 Women having official maternal death reports including “cause 2 and cause 3 together”
C07 For each record in C04 find the ones with corresponding birth record (the date of death counted as day 1 and day 42 is included)
described by
GROTriplestore
GRO Ontology
extract load
Digital Archivist
consulted by
amends/curates
Creation of RDF triples
Transform
GRO Database
Storage ModelMetadata that can be queried
declaratively with a W3C
standard
GRO Triplestore
Triplestore 2 Data Analysis
Transformation from one model to
another
• SPIN – SPARQL Inference
• SWRL / RuleML
• SPARQL Construct
• …
SEPA
RAT
ION
OF
CO
NC
ERN
S
GRO Records annotation vs. Data Analysis
<#B000-001> a
irl:BirthRecord;
irl:on "1900-08-08";
irl:name "James";
irl:mother "Mary Murphy";
irl:place "Castle Road"; …
<#B010-022> a
irl:BirthRecord;
irl:on "1902-04-19";
irl:name "Patrick";
irl:mother "Mary Murphy";
irl:place "Castle Road"; ...
<#B022-051> a
irl:BirthRecord;
irl:on "1904-09-20";
irl:name "Agnes";
irl:mother "Mary Murphy";
irl:place “Convent Hill"; ...
<#B050-003> a
irl:BirthRecord;
irl:on "1905-02-18";
irl:name "Michael";
#1 Mary Murphy
#2 Mary Murphy
#3 Mary Murphy
#4 Mary Murphy
owl:sameAs
owl:sameAsowl:sameAs
TRANSFORMATION
ONTOLOGY MATCHING
All generated are stored separately
for data analytics ...
#1 Mary Murphy
#1 Mary Murphy
#1 Mary Murphy
James Patrick Michael
1900-08-08 1902-04-19 1905-02-18
619 days 1036 days
Average sibship interval = 827.5 days
Data analysis on the generated triples
Data Challenges
•Data security - transfer, storage and use by authorised parties
•Data protection best practice
•Quantity of data
•Varying levels of detail eg causes of death
• Establishing maternal death- fever
•Archaic medical terms
•Variances in record subject names and places
•Place names changes over time
DRI Presentation
The Irish Record Linkage Knowledge Platform
• State of the art linked data & ontology based analysis platform for historical 'big data'
• Platform within a secure, closed system
• Prepared to allow formulation of the specific research queries
• Query interface to allow for the historical analysis of the data.
• Potential expansion to include additional contextualising datasets
@IRL_Project http://irishrecordlinkage.wordpress.com/