introduction to the ibm watson data platform
TRANSCRIPT
IBM Watson Data Platform and Open Data
27 February 2017
Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform
@MargrietGr
https://medium.com/ibm-watson-data-lab
@MargrietGr
About me
Developer Advocate, Data scientist
Previous
Research Fellow at University of Exeter, UK
PhD at VU University Amsterdam, the Netherlands
@MargrietGr
IBM Watson Data Platform
Connect Discover Accelerate
@MargrietGr
IBM Watson Data Platform
IBM Bluemixhttps://console.ng.bluemix.net/
@MargrietGr
Bluemix
https://console.ng.bluemix.net/
@MargrietGr
https://github.com/snowch/movie-recommender-demo
@MargrietGr
https://movie-recommender-demo-margrietgroenendijk-1234.mybluemix.net/
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
APIs
https://github.com/MargrietGroenendijk/Bristol
https://github.com/MargrietGroenendijk/Bristol
@MargrietGr
Example : twitter
@MargrietGr
Example : Watson Tone Analyser
@MargrietGr
EmotionLanguage style
Social propensities
Analyze how you are coming across to others
CloudantNoSQL
@MargrietGr
Cloudant is a database
id firstname lastname dob
1 John Smith 1970-01-01
2 Kate Jones 1971-12-25
{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01" }
@MargrietGr
Cloudant is "schemaless"
{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]" }
@MargrietGr
Cloudant is "schemaless"
{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true }
@MargrietGr
Cloudant is "schemaless"
{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true, "tags": ["tall", "glasses"] }
@MargrietGr
Cloudant is "schemaless"
{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true, "tags": ["tall", "glasses"], "address" : { "number": 14, "street": "Front Street", "town": "Luton", "postcode": "LU1 1AB" } }
@MargrietGr
Cloudant is built for the web
▪Store JSON Documents
▪Speaks an HTTP API
▪Lives on the web
@MargrietGr
Cloudant is fault tolerant
@MargrietGr
Cloudant is fault tolerant
@MargrietGr
Cloudant is resilient
"write"
@MargrietGr
Cloudant is resilient
"ok"
"write"
@MargrietGr
Cloudant is scalable
@MargrietGr
Cloudant replicates
@MargrietGr
Cloudant replicates
@MargrietGr
Cloudant replicates
@MargrietGr
Cloudant replicates
@MargrietGr
Runkeeper
@MargrietGr
@MargrietGr
Open Street Map Data
IBM Cloudant Use from anywhere!
Daily updatesVM daily cron Python script
Always up to date! Currently 12,467,460 POIs
@MargrietGr
wget -c http://download.geofabrik.de/europe/netherlands-latest.osm.pbf
Several data sources - world, continent, country, city or a user defined box
Several data formats for which free to use conversion tools exist - pbf, osm, json, shp
Example:
@MargrietGr
Extract the POIs with osmosis
osmosis --read-pbf netherlands-latest.osm.pbf \--tf accept-nodes \
aerialway=station \aeroway=aerodrome,helipad,heliport \amenity=* craft=* emergency=* \highway=bus_stop,rest_area,services \historic=* leisure=* office=* \ public_transport=stop_position,stop_area \shop=* tourism=* \
--tf reject-ways --tf reject-relations \--write-xml netherlands.nodes.osm
(easy to install with brew on Mac)
@MargrietGr
Some cleaning up with osmconvert
Convert from osm to json format with ogr2ogr
osmconvert $netherlands.nodes.osm --drop-ways --drop-author --drop-relations --drop-versions >$netherlands.poi.osm
ogr2ogr -f GeoJSON $netherlands.poi.json $netherlands.poi.osm points
@MargrietGr
Upload to Cloudant with couchimport
export COUCH_URL="https://username:[email protected]"
cat $netherlands.poi.json | couchimport --db poi-$netherlands --type json --jsonpath "features.*"
https://github.com/glynnbird/couchimport
IBM Cloudant
@MargrietGr
Examples from
https://console.ng.bluemix.net/docs/services/Cloudant/api/cloudant-geo.html#cloudant-geospatial
@MargrietGr
@MargrietGr
UK Crime Data from https://data.police.uk/data/
@MargrietGr
https://opendata.cloudant.com/crimes-uk/_design/spatial/_geo/newGeoIndex?bbox=-2.600283622741699%2C51.44886539765683%2C-2.5962066650390625%2C51.4533851454499&limit=20&relation=contains
@MargrietGr
Python - requests
dashDBData warehouse
@MargrietGr
Add the dashDB service in Bluemix
Add a service
Search for dashDB
@MargrietGr
@MargrietGr
3
1
2
posted:2016-08-01,2016-10-01 followers_count:3000 friends_count: 3000 (weather OR sun OR sunny OR rain OR hail OR storm OR rainy OR drought OR flood OR hurricane OR tornado OR cold OR snow OR drizzle OR cloudy OR thunder OR lightning OR wind OR windy OR heatwave)
REST API docs:https://new-console.ng.bluemix.net/docs/services/Twitter/twitter_rest_apis.html#rest_apis
Search for tweets
4 Select table
Use an existing service
@MargrietGr
Apache Spark
@MargrietGr
Apache Spark
@MargrietGr
@MargrietGr
RDDs : Resilient Distributed Datasets
Data does not have to fit on a single machine
Data is separated into partitions
Creation of RDDs
Load an external dataset
Distribute a collection of objects
Transformations construct a new RDD from a previous one (lazy!)
Actions compute a result based on an RDD
@MargrietGr
Load tweets from dashDB with Spark SQL
@MargrietGr
Clean data, summarise and load into pandas DataFrame
IBM Data Science Experience
datascience.ibm.com
@MargrietGr
Getting started
▪ Go to datascience.ibm.com and sign in with your Bluemix account when you have one, else sign up for one at the top right of the screen
Create a project
▪ Create New project, click on the link in top of the screen▪ Or go to the My Projects in the menu on the left of the screen and click Create New Project
here
Create a project
▪ Name the Project▪ Choose a Spark Service▪ Choose an Object Storage▪ Click Create
Add collaborators
▪ Click add collaborator▪ Search for your project members▪ Select Permission
Add a notebook
▪ Click add notebooks
Add a notebook
▪ Click add notebooks▪ Pick your favourite:▪ Python 2 ▪ Scala▪ R▪ Choose Spark 1.6 or 2.0▪ Click Create Notebook
Let’s write some code
▪ Click the pen icon to start adding code (edit mode)▪ When collaborating only one person can edit, others can add comments to the notebook
when in view mode
@MargrietGr
Example : Bristol open data
@MargrietGr
Object-store
@MargrietGr
Python package PixieDust
@MargrietGr
Watson Machine Learning
@MargrietGr
IBM Watson Data Platform
Bluemix
Data storage
Apps
Watson APIs
Weather
Data Science Experience
Watson Machine Learning
Watson Analytics
Thanks!https://github.com/MargrietGroenendijk/Bristol
http://www.slideshare.net/MargrietGroenendijk/presentations
@MargrietGr
https://medium.com/ibm-watson-data-lab