open addresses symposium slides
Post on 19-Dec-2014
237 Views
Preview:
DESCRIPTION
TRANSCRIPT
Open Addresses SymposiumMeeting the Challenges
Address inference from open data
John Murray
Open Addresses Symposium 2
Sources of Addresses
• Land Registry• Companies House• National Social Housing Register (NROSH)• NHS – GP surgeries, hospitals etc• Lists of schools• Government department asset lists• Scottish gazetteers (are they really open?)
8 August 2014
Open Addresses Symposium 3
Sources of Spatial Information• Ordnance Survey:
– Codepoint Open– OS Locator– OS Gazetteer– Street view– Named places, settlement seeds, DLUA boundaries, parishes.
• ONS– ONSPD Postcode directory– Built up areas.– Census boundaries.
• Land Registry– Cadestral Polygons (dispute about whether they are open)
• DfT– National Public Transport Gazetteer.
8 August 2014
Open Addresses Symposium 4
Proposal
• Build a street and places gazetteer, to which address points (PAON and SAON) may be attached.
• Use spatial data to verify veracity of loaded data from open sources.
• Apply confidence score to each record based on:– Spatial integrity– Frequency of appearance within and across sources.
• Towns and localities inferred by filling gaps.• Street layout analysis:
– Position of buildings by pixel analysis.– Postcode to numbering: e.g. odds and evens
8 August 2014
Open Addresses Symposium 5
Pixel Analysis
• Overlay vector streets and postcode centroids on OS StreetView
• Use in conjunction with OS locator for context and extent.
• Analyse pixel colour within buffer either side of road to estimate buildings extent.
• Can be used to:– Ensure veracity of other data– Infill missing properties– More accurately assign streets to postcodes
8 August 2014
Open Addresses Symposium 6
Pixel Analysis
8 August 2014
Open Addresses Symposium 7
Adding Land Registry
8 August 2014
Open Addresses Symposium 8
Maximising Available Data
• Using ONSPD, correcting postcodes where there is an unambiguous coordinate match from a terminated postcode to new one.
• Accounts for 50% of retired codes.• Correcting misspellings by reference to
dictionaries using lexical analysis.• Reference earlier versions of the data.
8 August 2014
Open Addresses Symposium 9
Source Audits
• Land Registry – Good quality, kept up to date, few errors. Covers England and Wales.
• Companies House – Data quality issues, particularly older companies. Covers UK.
• NROSH – Variable quality. Covers England
8 August 2014
Open Addresses Symposium 10
Prototype
• Contains all current GB postcodes.• Streets added where possible.• Localities added where possible.• Corrects retired postcodes where possible.• Shows nearest postcodes if not.• Built from 4 sources, with gaps filled by
inference.
8 August 2014
Open Addresses Symposium 11
Initial ResultsOS_Locator LandReg Companies NROSH Count Percent
0 0 0 1 5,042 0.29%0 0 1 0 18,990 1.09%0 0 1 1 606 0.03%0 1 0 0 111,553 6.40%0 1 0 1 25,514 1.46%0 1 1 0 20,842 1.20%0 1 1 1 5,574 0.32%1 0 0 0 227,971 13.07%1 0 0 1 41,773 2.40%1 0 1 0 115,449 6.62%1 0 1 1 7,065 0.41%1 1 0 0 381,917 21.90%1 1 0 1 166,608 9.55%1 1 1 0 348,669 19.99%1 1 1 1 122,299 7.01%
Unmatched 144,218 8.27%
1,744,090100.00
%
8 August 2014
Open Addresses Symposium 12
Weaknesses
• Lack of addresses for Scotland• Inference not always accurate due to:– Non-vehicular streets– Streets in close proximity– Not all addresses have a street– Address elements not unique at postcode sector
• Questions about openness of some data
8 August 2014
Open Addresses Symposium 13
Conclusion
• More study needed on veracity to:– Understand issues in data.– Ensure integrity of database.– Make more accurate assumptions.
• Crowdsourcing:– Same methods could be used to ensure veracity.– Could be offered a free/low cost service to SMEs
• Lobbying for more data to be made open.
8 August 2014
Questions?
Test drive the prototype.
top related