open addresses symposium slides
DESCRIPTION
My slides from the ODI OpenAddresses Symposium held on 8th August. OpenAddresses is about providing an open address gazetteer for the UK as an alternative to Royal Mail PAF.TRANSCRIPT
Open Addresses SymposiumMeeting the Challenges
Address inference from open data
John Murray
Open Addresses Symposium 2
Sources of Addresses
• Land Registry• Companies House• National Social Housing Register (NROSH)• NHS – GP surgeries, hospitals etc• Lists of schools• Government department asset lists• Scottish gazetteers (are they really open?)
8 August 2014
Open Addresses Symposium 3
Sources of Spatial Information• Ordnance Survey:
– Codepoint Open– OS Locator– OS Gazetteer– Street view– Named places, settlement seeds, DLUA boundaries, parishes.
• ONS– ONSPD Postcode directory– Built up areas.– Census boundaries.
• Land Registry– Cadestral Polygons (dispute about whether they are open)
• DfT– National Public Transport Gazetteer.
8 August 2014
Open Addresses Symposium 4
Proposal
• Build a street and places gazetteer, to which address points (PAON and SAON) may be attached.
• Use spatial data to verify veracity of loaded data from open sources.
• Apply confidence score to each record based on:– Spatial integrity– Frequency of appearance within and across sources.
• Towns and localities inferred by filling gaps.• Street layout analysis:
– Position of buildings by pixel analysis.– Postcode to numbering: e.g. odds and evens
8 August 2014
Open Addresses Symposium 5
Pixel Analysis
• Overlay vector streets and postcode centroids on OS StreetView
• Use in conjunction with OS locator for context and extent.
• Analyse pixel colour within buffer either side of road to estimate buildings extent.
• Can be used to:– Ensure veracity of other data– Infill missing properties– More accurately assign streets to postcodes
8 August 2014
Open Addresses Symposium 6
Pixel Analysis
8 August 2014
Open Addresses Symposium 7
Adding Land Registry
8 August 2014
Open Addresses Symposium 8
Maximising Available Data
• Using ONSPD, correcting postcodes where there is an unambiguous coordinate match from a terminated postcode to new one.
• Accounts for 50% of retired codes.• Correcting misspellings by reference to
dictionaries using lexical analysis.• Reference earlier versions of the data.
8 August 2014
Open Addresses Symposium 9
Source Audits
• Land Registry – Good quality, kept up to date, few errors. Covers England and Wales.
• Companies House – Data quality issues, particularly older companies. Covers UK.
• NROSH – Variable quality. Covers England
8 August 2014
Open Addresses Symposium 10
Prototype
• Contains all current GB postcodes.• Streets added where possible.• Localities added where possible.• Corrects retired postcodes where possible.• Shows nearest postcodes if not.• Built from 4 sources, with gaps filled by
inference.
8 August 2014
Open Addresses Symposium 11
Initial ResultsOS_Locator LandReg Companies NROSH Count Percent
0 0 0 1 5,042 0.29%0 0 1 0 18,990 1.09%0 0 1 1 606 0.03%0 1 0 0 111,553 6.40%0 1 0 1 25,514 1.46%0 1 1 0 20,842 1.20%0 1 1 1 5,574 0.32%1 0 0 0 227,971 13.07%1 0 0 1 41,773 2.40%1 0 1 0 115,449 6.62%1 0 1 1 7,065 0.41%1 1 0 0 381,917 21.90%1 1 0 1 166,608 9.55%1 1 1 0 348,669 19.99%1 1 1 1 122,299 7.01%
Unmatched 144,218 8.27%
1,744,090100.00
%
8 August 2014
Open Addresses Symposium 12
Weaknesses
• Lack of addresses for Scotland• Inference not always accurate due to:– Non-vehicular streets– Streets in close proximity– Not all addresses have a street– Address elements not unique at postcode sector
• Questions about openness of some data
8 August 2014
Open Addresses Symposium 13
Conclusion
• More study needed on veracity to:– Understand issues in data.– Ensure integrity of database.– Make more accurate assumptions.
• Crowdsourcing:– Same methods could be used to ensure veracity.– Could be offered a free/low cost service to SMEs
• Lobbying for more data to be made open.
8 August 2014
Questions?
Test drive the prototype.