weather of the century: visualization
DESCRIPTION
MongoDB natively supports geospatial indexing and querying, and it integrates easily with open source visualization tools. In this webinar, learn high-performance techniques for querying and retrieving geospatial data, and how to create a rich visual representation of global weather data using Python, Monary, and Matplotlib.TRANSCRIPT
A. Jesse Jiryu Davis
The Weather of the Century:Visualization
Senior Python Engineer, MongoDB
@jessejiryudavis
Serious MongoDB Talk
Serious MongoDB Talk
Database
Serious MongoDB Talk
This Talk
Where’s the data from?
Where’s the data from?
How Much Is There?
Visualization
Visualization Pipeline
MongoDB PyMongo NumPy MatplotlibPython dicts
SciPy
{ ts: ISODate("1991-01-01T00:00:00Z"), position: { type: "Point", coordinates: [ -94.6, 39.117 ] }, airTemperature: { value: 45, quality: "1" }}
GeoJSON
import numpyimport pymongo
data = []db = pymongo.MongoClient().my_database
for doc in db.collection.find(query): data.append(( doc['position']['coordinates'][0], doc['position']['coordinates'][1], doc['airTemperature']['value']))
arrays = numpy.array(data)
# NumPy column access syntax.lons = arrays[:, 0]lats = arrays[:, 1]temps = arrays[:, 2]
from scipy import griddatafrom matplotlib import pyplot
xs = numpy.linspace(-180, 180, 361)ys = numpy.linspace(-90, 90, 181)zs = griddata(lats, lons, temps, (xs, ys), method='linear')
pyplot.contour(xs, ys, zs)
Magic!!
Also magic!!
from matplotlib import pyplot
xs = numpy.linspace(-180, 180, 361)ys = numpy.linspace(-90, 90, 181)zs = griddata(lats, lons, temps, (xs, ys), method='linear')
pyplot.contour(xs, ys, zs)
Triangulation
Triangulation
What temperature?
Triangulation
Barycentric Interpolation
What temperature?53
48
54
Weighted Average
51.1
Interpolation
51.1
Interpolation
Interpolation
Contours
Contours
import numpyimport pymongo
data = []db = pymongo.MongoClient().my_database
for doc in db.collection.find(query): data.append(( doc['position']['coordinates'][0], doc['position']['coordinates'][1], doc['airTemperature']['value']))
arrays = numpy.array(data)
Not terrifically fast
Analyzing large datasets
• Querying: 109k documents per second• (On localhost)• Can we go faster?• Enter “Monary”
MongoDB PyMongo NumPy MatplotlibPython dicts
MongoDB Monary NumPy Matplotlib
Monaryby David Beach
import monary
data = []connection = monary.Monary()
arrays = monary_connection.query( db='my_database', coll='collection', query=query, fields=[ 'position.coordinates.0', 'position.coordinates.1', 'airTemperature.value'], types=[ 'float32', 'float32', 'float32'])
Monary
• PyMongo: 109k documents per second
• Monary: 817k documents per second
Visualization
• Author:David Beach
• Interns:Kyle SuarezMatt Cotter
• Mentors:A. Jesse Jiryu DavisJason Carey
Monary
Recent features:
• Easy installation
• Nested field access
• Aggregation
• Python 3
Monary
• Insert, update, remove
• SSL and authentication mechanisms
• parallelCollectionScan
Monary
Future:
Thanks
• Monary
• NumPy
• SciPy
• Matplotlib
Thanks
Thank you
#MongoDBWorld
A. Jesse Jiryu DavisSenior Python Engineer, MongoDB