no(geo)sql
DESCRIPTION
An overview of how to handle Geo in DBMS form a NoSQL point of view Hibernate Search spatial moduleTRANSCRIPT
No(Geo)SQLGeographic search in (No)SQL
Me
8 years mappy.com as platform architectand deputy CTO
Founding partner of NovaCodex since 2008
@NHelleringer
Why
Geo in databasesWhat the point ?
Why
Geo in databases challenges
Data is complex to store in SQL
Data is bi dimensional
Data is dense
Data is huge
Origin (challenge)
Multiples dimensions but B-trees sort on oneQuery dependent index sorting calculation
New data structures and algorithms to handle dimensionsA two phases search : select and then filter
Origin (needs)
Geographic Information Systemshandling of geometric objects
The origins of geography in the information systems are in the needs administrations had to handle data of the real world :
Geology / Geography Roads, administrative areas for cadastral surveys Census data Infrastructure elements (water delivery network, electrical
delivery network, communication network)
Other needs came when the data became available and use the same tools :
Geo marketing (market areas)
How
All you ever hated about SQL … and more !
Complex SQL additions
Full size complex normalized API
Vendor dependent implementations
Not scalable
Current Implementations (traditionnal SGBD)
OracleQuad Trees / R-Trees
Oracle 4 side dev (1984)integrated in Oracle 7 (1992)
SQL Server4 level Grid Index
Since 2008 version (2007)
SpatialiteR-Trees
since 3.6.0 (Mar 2008)
PostgreSQLR-tree-over-GiST
since PostGIS 1.0 for 8.0(Apr 2005)
The Open Geospatial Consortium edits a norm : OpenGIS
MySQL since Feb 2005, DB2 Spatial Extender since July 2006, Ingres added support very recently
Hibernate Spatial is a generic access to OpenGis implementations
GIS Software as ESRI, MapInfo, GeoConcept, QuantumGIS use this standard to access data
Puzzled ?
Do we need all this ?
Is Geo only for geo centric companies ?
How
LBS changed everything !
Maps, geocoding & route planning available
Platforms handle millions of hits/day
Available through multiples APIs
Often for free
How
GEOCODING
Data is hugeNot a geo problemExpertise extremely valued
Provided
MAPS
Data is huge and complex objectsIndexing is geoProcessing capabilities required
Provided
ROUTE PLANNING
Data is hugeNot a geo problemNot shard able
Provided
POI SEARCH
Data is less huge (your business size)Indexing is geoMay shard
Less relevant
Origin (needs)
Location aware datahandling of data associated with a latitude/longitude tuple
Location became a search criterion : Geo search
The map/the geography is the center of the search process Proximity search
The location is one in many criteria to refine a search
New Solutions ?
Does NoSQL help ?
Geo as a NoSQL Technology
Why does Geo fits a NoSQL approach ?
Geo does not fit in traditional ‘pure’ DBMS : First normal form (1NF), many dimensions in one column break the rules
(48,23) <?> (47,25)
Geo Objects hard to be strictly defined by SQL types : they are fickle
Tim Anglade ‘No SQL for fun and profit’ : Geo/hierarchical is one of seven forms of NoSQL to date
Extensions to SQL or NoSQL data stores Quad-trees R-trees
Geo as a NoSQL Technology
quad-tree
How does it work ?
Search steps1) Select
Compute level Compute boxes ids Fetch boxes
2) Filter Compute distance Select result set
Limits High levels
r-tree
Current Implementations (NoSQL databases)
Spatial Lucene/Solr, Elastic Search Quad tree labels in Lucene tokens Tile indices or GeoHash labels
GeoCouch R-tree in Erlang
Neo4J Spatial R-tree & quad-tree Object can be stored as graph elements
Current Implementations (NoSQL databases)
MongoDb Geo hashes into MongoDB B-trees Shard support incoming Spherical model since 1.7
Pincaster In memory quad tree
How
How do I build PoI search ?
POI Search
Do it in pure SQL !!
Use a clustered long, lat index :o Select is done by the cluster on
longitude (whish is more selective than latitude !)
o Bounding box requests are handled on the index level as latitude is included
o Filter with distance calculation can be done by a stored procedure on the database side or in application code
POI Search
Lucene via Hibernate Search
o Available in 4.2 beta 1o Annotation basedo Simple to step ino Refine by usage o DSL supported
Sample indexation code
@Indexed@Spatialpublic class Hotel { @Latitude Double latitude; @Longitude Double longitude; [...]
Sample search code
QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( PoI.class ).get();
double centerLatitude= 24;double centerLongitude= 31.5;
Query luceneQuery = builder.spatial() .onCoordinates( PoI.class.getName() ) .within( 50, Unit.KM ) .ofLatitude( centerLatitude ) .andLongitude( centerLongitude ) .createQuery();
End !
Thank you for listening !
Ref
http://www.slideshare.net/timanglade/nosql-for-fun-profit
http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Quadtree
http://technet.microsoft.com/en-us/library/bb964712.aspx
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html
http://vmx.cx/cgi-bin/blog/index.cgi/geocouch-geospatial-queries-with-couchdb:2008-10-26:en,CouchDB,Python,geo
http://wiki.neo4j.org/content/Neo4j_Spatial
http://www.osgeo.org/
http://relation.to/Bloggers/SpatialQueriesFirstBetaForHibernateSearch42IsAvailable
http://www.novacodex.net/