introduction to the ibm watson data platform

74
IBM Watson Data Platform and Open Data 27 February 2017 Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform @MargrietGr https://medium.com/ibm-watson-data-lab

Upload: margriet-groenendijk

Post on 20-Mar-2017

111 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Introduction to the IBM Watson Data Platform

IBM Watson Data Platform and Open Data

27 February 2017

Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform

@MargrietGr

https://medium.com/ibm-watson-data-lab

Page 2: Introduction to the IBM Watson Data Platform

@MargrietGr

About me

Developer Advocate, Data scientist

Previous

Research Fellow at University of Exeter, UK

PhD at VU University Amsterdam, the Netherlands

Page 3: Introduction to the IBM Watson Data Platform

@MargrietGr

IBM Watson Data Platform

Connect Discover Accelerate

Page 4: Introduction to the IBM Watson Data Platform

@MargrietGr

IBM Watson Data Platform

Page 5: Introduction to the IBM Watson Data Platform

IBM Bluemixhttps://console.ng.bluemix.net/

Page 6: Introduction to the IBM Watson Data Platform

@MargrietGr

Bluemix

https://console.ng.bluemix.net/

Page 7: Introduction to the IBM Watson Data Platform

@MargrietGr

https://github.com/snowch/movie-recommender-demo

Page 8: Introduction to the IBM Watson Data Platform

@MargrietGr

https://movie-recommender-demo-margrietgroenendijk-1234.mybluemix.net/

Page 9: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 10: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 11: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 12: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 13: Introduction to the IBM Watson Data Platform

@MargrietGr

APIs

Page 14: Introduction to the IBM Watson Data Platform

https://github.com/MargrietGroenendijk/Bristol

Page 15: Introduction to the IBM Watson Data Platform

https://github.com/MargrietGroenendijk/Bristol

Page 16: Introduction to the IBM Watson Data Platform
Page 17: Introduction to the IBM Watson Data Platform

@MargrietGr

Example : twitter

Page 18: Introduction to the IBM Watson Data Platform

@MargrietGr

Example : Watson Tone Analyser

Page 19: Introduction to the IBM Watson Data Platform

@MargrietGr

EmotionLanguage style

Social propensities

Analyze how you are coming across to others

Page 20: Introduction to the IBM Watson Data Platform

CloudantNoSQL

Page 21: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is a database

id firstname lastname dob

1 John Smith 1970-01-01

2 Kate Jones 1971-12-25

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01" }

Page 22: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]" }

Page 23: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true }

Page 24: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true, "tags": ["tall", "glasses"] }

Page 25: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is "schemaless"

{ "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "[email protected]", "confirmed": true, "tags": ["tall", "glasses"], "address" : { "number": 14, "street": "Front Street", "town": "Luton", "postcode": "LU1 1AB" } }

Page 26: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is built for the web

▪Store JSON Documents

▪Speaks an HTTP API

▪Lives on the web

Page 27: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is fault tolerant

Page 28: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is fault tolerant

Page 29: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is resilient

"write"

Page 30: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is resilient

"ok"

"write"

Page 31: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant is scalable

Page 32: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant replicates

Page 33: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant replicates

Page 34: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant replicates

Page 35: Introduction to the IBM Watson Data Platform

@MargrietGr

Cloudant replicates

Page 36: Introduction to the IBM Watson Data Platform

@MargrietGr

Runkeeper

Page 37: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 38: Introduction to the IBM Watson Data Platform

@MargrietGr

Open Street Map Data

IBM Cloudant Use from anywhere!

Daily updatesVM daily cron Python script

Always up to date! Currently 12,467,460 POIs

Page 39: Introduction to the IBM Watson Data Platform

@MargrietGr

wget -c http://download.geofabrik.de/europe/netherlands-latest.osm.pbf

Several data sources - world, continent, country, city or a user defined box

Several data formats for which free to use conversion tools exist - pbf, osm, json, shp

Example:

Page 40: Introduction to the IBM Watson Data Platform

@MargrietGr

Extract the POIs with osmosis

osmosis --read-pbf netherlands-latest.osm.pbf \--tf accept-nodes \

aerialway=station \aeroway=aerodrome,helipad,heliport \amenity=* craft=* emergency=* \highway=bus_stop,rest_area,services \historic=* leisure=* office=* \ public_transport=stop_position,stop_area \shop=* tourism=* \

--tf reject-ways --tf reject-relations \--write-xml netherlands.nodes.osm

(easy to install with brew on Mac)

Page 41: Introduction to the IBM Watson Data Platform

@MargrietGr

Some cleaning up with osmconvert

Convert from osm to json format with ogr2ogr

osmconvert $netherlands.nodes.osm --drop-ways --drop-author --drop-relations --drop-versions >$netherlands.poi.osm

ogr2ogr -f GeoJSON $netherlands.poi.json $netherlands.poi.osm points

Page 42: Introduction to the IBM Watson Data Platform

@MargrietGr

Upload to Cloudant with couchimport

export COUCH_URL="https://username:[email protected]"

cat $netherlands.poi.json | couchimport --db poi-$netherlands --type json --jsonpath "features.*"

https://github.com/glynnbird/couchimport

IBM Cloudant

Page 43: Introduction to the IBM Watson Data Platform

@MargrietGr

Examples from

https://console.ng.bluemix.net/docs/services/Cloudant/api/cloudant-geo.html#cloudant-geospatial

Page 44: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 45: Introduction to the IBM Watson Data Platform

@MargrietGr

UK Crime Data from https://data.police.uk/data/

Page 47: Introduction to the IBM Watson Data Platform

@MargrietGr

Python - requests

Page 48: Introduction to the IBM Watson Data Platform

dashDBData warehouse

Page 49: Introduction to the IBM Watson Data Platform

@MargrietGr

Add the dashDB service in Bluemix

Add a service

Search for dashDB

Page 50: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 51: Introduction to the IBM Watson Data Platform

@MargrietGr

3

1

2

posted:2016-08-01,2016-10-01 followers_count:3000 friends_count: 3000 (weather OR sun OR sunny OR rain OR hail OR storm OR rainy OR drought OR flood OR hurricane OR tornado OR cold OR snow OR drizzle OR cloudy OR thunder OR lightning OR wind OR windy OR heatwave)

REST API docs:https://new-console.ng.bluemix.net/docs/services/Twitter/twitter_rest_apis.html#rest_apis

Search for tweets

4 Select table

Use an existing service

Page 52: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 53: Introduction to the IBM Watson Data Platform

Apache Spark

Page 54: Introduction to the IBM Watson Data Platform

@MargrietGr

Apache Spark

Page 55: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 56: Introduction to the IBM Watson Data Platform

@MargrietGr

RDDs : Resilient Distributed Datasets

Data does not have to fit on a single machine

Data is separated into partitions

Creation of RDDs

Load an external dataset

Distribute a collection of objects

Transformations construct a new RDD from a previous one (lazy!)

Actions compute a result based on an RDD

Page 57: Introduction to the IBM Watson Data Platform

@MargrietGr

Load tweets from dashDB with Spark SQL

Page 58: Introduction to the IBM Watson Data Platform

@MargrietGr

Clean data, summarise and load into pandas DataFrame

Page 59: Introduction to the IBM Watson Data Platform

IBM Data Science Experience

Page 60: Introduction to the IBM Watson Data Platform

datascience.ibm.com

Page 61: Introduction to the IBM Watson Data Platform

@MargrietGr

Page 62: Introduction to the IBM Watson Data Platform

Getting started

▪ Go to datascience.ibm.com and sign in with your Bluemix account when you have one, else sign up for one at the top right of the screen

Page 63: Introduction to the IBM Watson Data Platform

Create a project

▪ Create New project, click on the link in top of the screen▪ Or go to the My Projects in the menu on the left of the screen and click Create New Project

here

Page 64: Introduction to the IBM Watson Data Platform

Create a project

▪ Name the Project▪ Choose a Spark Service▪ Choose an Object Storage▪ Click Create

Page 65: Introduction to the IBM Watson Data Platform

Add collaborators

▪ Click add collaborator▪ Search for your project members▪ Select Permission

Page 66: Introduction to the IBM Watson Data Platform

Add a notebook

▪ Click add notebooks

Page 67: Introduction to the IBM Watson Data Platform

Add a notebook

▪ Click add notebooks▪ Pick your favourite:▪ Python 2 ▪ Scala▪ R▪ Choose Spark 1.6 or 2.0▪ Click Create Notebook

Page 68: Introduction to the IBM Watson Data Platform

Let’s write some code

▪ Click the pen icon to start adding code (edit mode)▪ When collaborating only one person can edit, others can add comments to the notebook

when in view mode

Page 69: Introduction to the IBM Watson Data Platform

@MargrietGr

Example : Bristol open data

Page 70: Introduction to the IBM Watson Data Platform

@MargrietGr

Object-store

Page 71: Introduction to the IBM Watson Data Platform

@MargrietGr

Python package PixieDust

Page 72: Introduction to the IBM Watson Data Platform

@MargrietGr

Watson Machine Learning

Page 73: Introduction to the IBM Watson Data Platform

@MargrietGr

IBM Watson Data Platform

Bluemix

Data storage

Apps

Watson APIs

Weather

Data Science Experience

Watson Machine Learning

Watson Analytics

Page 74: Introduction to the IBM Watson Data Platform

Thanks!https://github.com/MargrietGroenendijk/Bristol

http://www.slideshare.net/MargrietGroenendijk/presentations

@MargrietGr

https://medium.com/ibm-watson-data-lab