Download - The challenge:
Religion and Economic Change over a Century:
Linking Diverse Historical Data
New Technologies andInterdisciplinary Research on Religion
Harvard, 2010
Robert D. Woodberry Juan Carlos EsparzaUniversity of Texas at Austin
Sociology Department and Population Research Center
The challenge:Roots of current differences may go back decades, even centuries – How test?
Religious recordsvaluable information
seldom used
Linking diverse sources over time
The data:Source: Data: Characteristics:
Electronic datasets
Recent censuses, surveys and geo-climatic data
Polygons & Grids of Cells
Historical data
Historic censuses and colonial records Polygons
Protestant Data
Missionaries, education, etc. Points (mission stations)
English, Danish, Norwegian, French, German, and Spanish
Catholic Data
Missionaries, education, etc. Polygons (ecclesiastical jurisdictions)
English, Chinese, Italian, French, German, Latin, Spanish, Polish, and Portuguese.
Problems:Gathering complete data
Digitizing data & maps
Normalizing and linking data from different sources
Dealing with missing data
Creating database for geo-spatial statistical modeling
Complete dataLocating and evaluating “the universe” of sources
Temporal coverage
Spatial coverage
Data Quality
Variables included
Complete dataComplete data often only available in archives: e.g., “Vatican Secret Archives,” & “Archives of Propaganda Fide”
Negotiating access
Locating, copying and digitizing sources
Spatial LinkingIssues:
1) Data given for different spatial units
2) Spatial units change over time
3) Accuracy of base map
Spatial Linking
1) Data given for different spatial units
Protestant: points
Catholic: polygons
Censuses, surveys, geo-climatic data:
different polygons and grids of cells
Spatial Linking
2) Spatial units change over time
Cities’ & towns’ names change
Catholic ecclesiastical jurisdictions evolve
National, provincial, and other state boundaries change
Spatial LinkingWhy Important?
Connecting data to proper geographic referente.g., EJs & provinces in 1913
Linking data over time
For statistical analysis
For imputation
(How does data in 1892 relate to data in 1934 and 2009)
Spatial Linking
3) Historic maps inaccurate (limited usefulness)
Points:Why matters:
1) change over time
2) link to proper polygon
3) link to proper geo-climatic conditions
Find place in modern gazetteer
Link locations between sources known alternative names
consistent institutions
Spatial Linking
Historic maps inaccurate (limited usefulness)
Territories: map spaghetti
Why matters:
1) Arbitrarily linking borders
2) Imputing data to artificial slivers
3) How link data when no maps
Spatial Linking
Improving accuracy:
Start with accurate modern maps
Reconstruct border change from legal documents
Reconstruct border overlap from legal documents
(e.g., Catholics and state jurisdictions borders)
Bring modern borders back through time
Linking (cont.)
Accurate base maps:
Current world maps insufficient accuracy(e.g., mission stations in ocean or wrong country)
Improve coastlines, islands, borders, and maritime boundaries
Remove sliversAllows automatic linking of point and polygon data
Maritime Boundaries
Reconstructing historic borders:
Papal decrees document changes in EJs & identify corresponding government borders
Linking (cont.)Reconstructing historic borders:
Check accuracy with country & empire records
Smallest unit in legal sources determines size of MCGUs and precision of data linking
When possible use modern borders, when not digitize border from relatively accurate historical maps
Linking (cont.)Determine Maximum Consistent Geographic Unit
(MCGU) before creating digital maps
MCGUs foundation for all linking and imputation
Only one base map (easy to update)
All other geographic units are unions of MCGUs
Linking (cont.)
Maximum Consistent Geographic Unit (MCGU)
All point and cell data link to MCGUs
Protestant data
Geo-climatic data
Missionary mortality data
Also allow contextual analysis
(spatial autocorrelations, etc.)
Minimizes over-aggregation of data
Linking (cont.)
Linking geo-climatic data (endogeneity)
Aggregate as grid of cells: Grid of boxes covering world
Assign unique IDs and vectorize raster data
Normalize so boxes perfectly overlap and IDs match between layers
(very hard and time consuming)
Aggregate for MCGUs
Linking (cont.)
Linking mortality data (endogeneity)
Data on over 100,000 missionary lives
Calculate comparative mortality estimates by linking lives to
1) points (mission stations)
2) polygons (Countries, EJs & MCGUs)
Can generalized to other areas based on geo-climatic conditions, etc.
Name Sex Born Sailed Loc_01 Begin End Loc_02 Begin2End2
Cover, James Fleet 1 1762 1796 Tahiti 1797 1798 Port Jackson 1798 1800
Eyre,John 1 1768 1796 Tahiti 1797 1808 Huahine 1808 1809
Jefferson, John 1 1760 1796 Tahiti 1797 1807
Lewis, Thomas 1 1765 1796 Tahiti 1797 1799
Bicknell, Henry 1 1766 1796 Tahiti 1797 1808 Port Jackson
Bowell, Daniel 1 1774 1796 Tongataboo 1797 1799
Broomhall, Benjamin 1 1776 1796 Tahiti 1797 1801
Buchanan, John 1 1765 1796 Tongataboo 1797 1800 Port Jackson 1800 1800
Cooper, James 1 1768 1796 Tongataboo 1797 1800 Port Jackson 1800 1801
Cock, John 1 1773 1796 Tahiti 1797 1798 Port Jackson 1798
Missing DataProblems:
Changing categories between sources/years
Inconsistent categories within same source
Missing places in source
Inconsistent years between sources
Missing Data (cont.)Strategies:
Finding missing data:
Letters of bishops to Pope
Triangulating between sources- To identify missing institutions &
organizations
- To identify estimates from inconsistencies
- To fill in missing data
Missing Data (cont.)Strategies:
Imputing missing data (multiple imputation):
Using: 1) trend over time in MCGUs
- e.g., using linked MCGUs in 1913 & 1932
to estimate 1923
2) pattern with neighbor
Can compare results with and without imputed data
An example: Mexico
Reconstruct all locality changes back to 1815
Reconstruct all EJ changes from 1850
Link historical censuses & modern surveys
Re-aggregate data according to any geographic unit (MCGU or larger)
Mexico (cont.)
Once completed:
All census, Catholic, and Protestant data linked for about 120 years
Multiple current surveys linked so can analyze modern consequences
Longitudinal database of MCGUs
Mexico (cont.)Interrupted Time Series:
impact of introducing Protestant missions on Catholic church behavior
impact of Catholic and Protestant interventions on the change in literacy between censuses
Cumulative Influence:
Endogeneity: test correlates of when and where Protestants and Catholics invest in particular areas.
Thank You!• .