geocoding overview

36
OpenCage FOSSGIS 2015 http://worldwideberlin.com/

Upload: lokku

Post on 14-Jul-2015

892 views

Category:

Technology


3 download

TRANSCRIPT

OpenCage FOSSGIS 2015

http://worldwideberlin.com/

OpenCage FOSSGIS 2015

Overview

I. place name disambiguation (homonyms)– with & without spellcheck

II. Nominatim

III. other (open data) geocoders

– 2015 trends– opportunities to share data, config, tests

IV. shared ranking/scoring data

OpenCage FOSSGIS 2015

OpenCage Geocoder

OpenCage FOSSGIS 2015

Welches Münster meinen sie?

OpenCage FOSSGIS 2015

Nominatim geocoder

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

Mühlheim vs Mülheim

OpenCage FOSSGIS 2015

“eifelturm”

OpenCage FOSSGIS 2015

“eiffel turm”

OpenCage FOSSGIS 2015

“eiffeltower” => no result

OpenCage FOSSGIS 2015

“eifel tower”

=> fair ground, Varna Bulgaria (fixed last week)

OpenCage FOSSGIS 2015

“eiffel tower”

=> one in Paris

=> replicas around the world

=> restaurants around the world

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

http://www.openstreetmap.org/#map=17/39.80885/116.28163

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

Nominatim

● OSM data, minutely updates● + UK postal codes, TIGER● 1TB PostGIS● import in C, setup scripts in PHP, Postgres stored

procedures, PHP frontend, Python&PHP test suite● autocomplete if you add Photon geocoder● no spellcheck

OpenCage FOSSGIS 2015

regression/blackbox tests

OpenCage FOSSGIS 2015

other geocoders

Closed source Open source, high resources Open source, low resources

Google Maps Mapzen “Pelias” OpenStreetMap “Nominatim”

Bing/Yahoo Mapbox “Carmen” OpenCage (multiple)

Mapquest Mapquest open (Nominatim) geonames

ESRI/ArcGIS Online Foursquare “Quattroshapes” geocod.io (Tiger data)

Baidu Scout Photon (Nominatim)

Yandex Cloudmade geo.io (Nominatim)

TomTom DSTK (Tiger, geonames)

Amazon (Android only) SmartyStreets

Telenav ...

Nokia/Ovi/Here

Apple (iOS only)

...

OpenCage FOSSGIS 2015

trends

● SSD● Add commercial sources● Full builds, downloadable index● High parallel (map/reduce, nodejs), cloud scaling,

noSQL● Community building, guidelines● Test suites

OpenCage FOSSGIS 2015

typical features to improve

● horizontal scaling● autocomplete● spellcheck● improve text parsing (App 3, 111-113b)● crossings (Main & 2nd N, New Orleans)● “4km north of $cityname on the N6”● tests for non-latin alphabets● postal code boundaries● localsearch/POIs

OpenCage FOSSGIS 2015

what should be shared

● aka. don't reinvent everything● standard test suite to compare geocoders● hierarchy data● address parsing● address formatting● language configuration● data parsing, e.g. OSM tags

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

OpenCage FOSSGIS 2015

openaddresses.io

● 110m addresses● 10GB of text files

1174 SMITH CREEK WAY, BRASSFIELD, WAKE FOREST, NC 27587

732 STEWARTS ROAD, LANEXA, VA 23124

OpenCage FOSSGIS 2015

address formatting

https://github.com/lokku/address-formatting/

– configuration– test cases for 33 countries– reference implementation in Perl

{ country_code: 'dk', village: 'Ærøskøbing', county: 'ÆrøMunicipality', house_number: '17A', neighbourhood: 'Paradiset',postcode: '5970', road: 'Baggårde', state: 'Region of Southern Denmark'}

Baggårde 17A, 5970 Ærøskøbing, Denmark

Adama Asnyka 1, 59-700 Bolesławiec, Poland

CAI, Cerrito 1250, Retiro, C1010AAZ Buenos Aires, Argentina

OpenCage FOSSGIS 2015

wikipedia data

OpenCage FOSSGIS 2015

core geocoding logic1. tokenize

2. filter

• fixed bounding box, browser window, country• OSM tags/POI search• min-max admin

3. search

4. rank

• country bias• language bias (client, explicit)• location boost (client, explicit, history)• maybe: spellcheck• maybe: retry/failover/remove phrases• importance boost

OpenCage FOSSGIS 2015

http://blog.mayflower.de/755-Schnelle-Volltextsuche-mit-Solr.html

OpenCage FOSSGIS 2015

map to hierachy (ranks)

http://wiki.openstreetmap.org/wiki/Nominatim/Development_overview

OpenCage FOSSGIS 2015

names, names, names

OpenCage FOSSGIS 2015

name is one of many factorsranking examples:

● Altona– type: suburb vs train station vs town ins US/Canada

● Germany– admin_level=2 (country) vs island

● Mt everest– importance: viewpoint vs peak vs island

● Oktoberfest– actually a alt_name of Theresienwiese

● Königsberg– 10x a peak, 1x old_name of Kaliningrad

● Hitlerberg– old_name:1934-1945 of Heigelkopf

OpenCage FOSSGIS 2015

status on wikipedia_articles.bin● version 1: wikipedia pageview logs

– https://en.wikipedia.org/wiki/Wikipedia:Notability

● version 2 (current): parsing wikipedia articles and count links

– last updated 2013– 80m wikipedia entries + 15m redirects– 0.6m places in OSM have wikipedia tag set (2013: 0.4m)

● Version 3 (TBD): parsing wikipedia geo exports

– http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Hauptseite/Wikipedia-World/en

– 3.4m entries, more languages, regular dumps, new documentaton

● version 4 (?)

- used wikidata exports

- used by multiple geocoders

OpenCage FOSSGIS 2015

what can mappers do?● add wikipedia tags● fix administrative levels● don't add wrong names (typos)● file bugs (github)

http://nominatim.openstreetmap.org/

OpenCage FOSSGIS 2015

… and if all fails: rename city

OpenCage FOSSGIS 2015

Questions ?

[email protected]