open data and web api

43
Open Data and Web API Sammy Fung Technology Sharing (28/1/2016) at Hong Kong Polytechnic University COMP

Upload: sammy-fung

Post on 15-Apr-2017

450 views

Category:

Technology


0 download

TRANSCRIPT

Open Data and Web API

Sammy Fung Technology Sharing (28/1/2016)

at Hong Kong Polytechnic University COMP

Sammy Fung• President, Open Source Hong Kong.

• Conference Chair, Hong Kong Open Source Conference.

• Tags: Freelancer, Developer, Open Source, Open Data, Startup.

• Contacts:

[email protected]

• @sammyfung

• https://github.com/sammyfung

• The presentation slide will be public on SlideShare in CC license.

• Creative Commons BY-NC-SA (Attribution, Non-Commerical, Sharealike)

Open Data

What is Data ?

1. What is current Air Quality Health Index

(AQHI) of Causeway Bay ?

AQHI

• Environment Protection Department (EPD)

• Find it out from AQHI website run by EPD.

• http://www.aqhi.gov.hk/en.html

• How about details of air quality in Causeway Bay ? Look into Pollutant Concentration of CWB.

So, we just found the “data” by human.

How does software program read the AQHI and Pollutant

Concentration of CWB (“data”) ?

Software and Data• We need values of data for software program.

• an integer or a float: eg. 2016, 6.89

• a character string: eg. “Causeway Bay”

• Retrieve data through “Interface” for data.

• Application Programming Interface (API)

• Computer/Software/Program Readable Data Format.

• Human communicates in a common language, eg. English, Cantonese, Mandarin.

• Data Formats: eg. XML, JSON.

XML Example<developers>

<developer>

<firstName>Richard</firstName> <lastName>Stallman</lastName>

</developer>

<developer>

<firstName>Linus</firstName> <lastName>Torvalds</lastName>

</developer>

<developer>

<firstName>Eric</firstName> <lastName>Raymond</lastName>

</developer>

</developers>

JSON Example

{"developers":[

{"firstName":"Richard", "lastName":"Stallman"},

{"firstName":"Linus", "lastName":"Torvalds"},

{"firstName":"Eric", "lastName":"Raymond"}

]}

Software and Data• Web Scraping: Retrieve documents from website.

• Information Extraction & Transformation:

• Extract and Transform data from common data format into data objects (variables) in software program.

• eg. JSON -> Float(s)

• “Clean Data” is needed for “non-good” data formats.

• eg. HTML -> Float(s)

Software and Data• Programming Language: eg. Python

• Web Scraping library: import scrapy

• JSON library: import json

• Regular Expression library: import re

• Other libraries (eg. database): import mysql

Installing Scrapy

• Scrapy is a web scraping framework written in ptyhon.

• virtualenv ~/env/scrapy

• source ~/env/scrapy/bin/activate

• pip install scrapy

Try in Scrapy Shell• Try in Scrapy Shell:

• scrapy startproject demo1

• scrapy shell http://www.aqhi.gov.hk/epd/ddata/html/out/24aqhi_Eng.xml

• a =response.xpath("//item[contains(.//StationName/text(), 'Causeway Bay’)]/aqhi/text()").extract()

• b = a[len(a)-1] # b is string

• c = int(a[len(a)-1]) # c is integer

• print (b, c) # show the difference

AQHI

• Phase 1: EPD provided AQHI in XML format.

• Phase 2: EPD provided both AQHI and Pollutant Concentration in XML format.

2. What is the current temperature of Shatin ?

Weather

• Hong Kong Observatory (HKO)

• http://www.weather.gov.hk

• Top Hobbyist Website: Weather Underground http://www.weather.org.hk/

So, we just found another “data” again by human.

How does software program read the current temperature

of Shatin (“data”) ?

Sorry! You need to subscribe to commercial paid data feed services provided by HKO.

XD

But……

We can do it by scraping from HTML document

(a harder method)

Try in Scrapy

• scrapy shell http://www.weather.gov.hk/wxinfo/ts/text_readings_e.htm

• a = response.xpath(“//pre").extract()[0]

• import re

• b = re.split("\n", a)

Clean Data with REc = ‘’

for i in b:

if re.search("^Sha Tin", i) and c=‘’:

c = re.sub("^Sha Tin *”,"",i)

c = re.sub(" .*”,”",c)

print c # c is string

print float(c) # c is float

Open Data

Open Data• Discoverable

• Available and Searchable on Internet.

• Structured

• Open and Machine-readable Format.

• Unconditional

• Legal Framework allows to reproduce and repurpose the data.

5-star Open Data DeploymentScheme

• Tim Berners-Lee, the inventor of the Web.

• 5stardata.info

• 1 Star: make your stuff available on the Web (whatever format) under an open license.

• 2 Star: make it available as structured data

• eg. Excel instead of image scan of a table

• 3 Star: use non-proprietary formats

• eg. CSV instead of Excel

• 4 Star: use URIs to denote things, so that people can point at your stuff

• 5 Star: link your data to other data to provide context.

Open Data in Hong Kong• OGCIO

• DATA.ONE in 2011.

• data.gov.hk in 2015.

• JSON/XML, RSS, XLS, CSV, JPEG/PNG,….

• Define workflow for other government department to release open data.

• OGCIO could not decide which data and format can be released

• Decision made by data owner of each government departments.

Open Data in Hong Kong• LegCo

• http://www.legco.gov.hk

• Voting results of LegCo meetings and some committee meetings in XML in Oct 2013.

• API is available in Fall 2014.

• Not part of DATA.ONE / DATA.GOV.HK.

HK Air Quality Data• AQHI, old API and Pollutant Concentration

• XML Data for past 24 hours.

• CSV Data for all past records.

• EPD released AQHI and old API at phase 1 few years ago.

• EPD also released Pollutant Concentration data in machine-readable format at phase 2 one year ago.

Weather in DATA.GOV.HK• I posted a blog 'Progress of Open Government Data in Hong Kong' on 2013/01/17.

• Weather at Data.One released 7 datasets only.

• All datasets are in RSS (XML) format which includes items with title and description only.

• Hourly weather reports, weather forecasts and special reports in 3 languages.

• Examples of missing data:

• Regional Weather Data updates from stations in every 10 minutes.

• One word: Useless.

• RSS Datasets on DATA.GOV.HK is completely different with HKO paid service (XML data feed).

Web API

API

• API = Application Programming Interface

• Retrieve data through “Interface”.

• C API, Python API, Objective-C API, Java API……

Web API• API for Web Server or Web Browser/Client.

• Usually Web APIs are used for connecting to 3rd party web services.

• Request and Response messaging interface via Web (HTTP) defined by service providers.

• Request URI example: https://api.twitter.com/1.1/statuses/user_timeline.json

• Data are exchanged in JSON or XML format.

REST• Representational state transfer

• One of reference styles of data exchange for Web 2.0.

• Web API design are usually in REST style.

• Systems communicates using HTTP verbs over HTTP communication.

• HTTP Verbs: GET, POST, PUT, DELETE,…

• GET: list or retrieve data

• POST: create data

• PUT: update or replace data

• DELETE: delete data

Communication Flow with API

• Authorization

• to retrieve a token for your web/mobile/backend apps to use the 3rd party API services.

• Re-direct users to 3rd party services for one-time auth (eg. Username, Password), and token will be used for future access until token is expired.

• For Application or Application-User Authorisation.

• eg. OAuth, XAuth.

• Do your any web API calls.

• API Rate Limits

Tweepy

• a 3rd party twitter library for python.

• pip install tweepy

• http://tweepy.readthedocs.org

Open Data and Web API• Structure of Open Data and Syntax of Web API

will be changed by service / data providers from time to time.

• You should subscribe to developer blog of those API and data services if possible.

• Use existing open source software tools to use web API, otherwise build your own tools (and consider to make it open source)

Open Source Software(Very Quick Version)

Open Source Software• Open Source = Source Codes are available to public.

• License: Licensed in one of Open Source Licenses.

• Freedom: Freely (re-)distribute

• You can charge for distribution costs but almost no one will do so.

• GitHub: Rich open source software library

• Git: a distributed version control software tool.

References• Free Software Codes: www.github.com

• Community: opensource.hk

• Conference: 2016.opensource.hk

• 6/24-25 Cyberport

• This Slide: slideshare.com/sammyfung

• Contact: [email protected] / @sammyfung