open data and web api
TRANSCRIPT
Open Data and Web API
Sammy Fung Technology Sharing (28/1/2016)
at Hong Kong Polytechnic University COMP
Sammy Fung• President, Open Source Hong Kong.
• Conference Chair, Hong Kong Open Source Conference.
• Tags: Freelancer, Developer, Open Source, Open Data, Startup.
• Contacts:
• @sammyfung
• https://github.com/sammyfung
• The presentation slide will be public on SlideShare in CC license.
• Creative Commons BY-NC-SA (Attribution, Non-Commerical, Sharealike)
AQHI
• Environment Protection Department (EPD)
• Find it out from AQHI website run by EPD.
• http://www.aqhi.gov.hk/en.html
• How about details of air quality in Causeway Bay ? Look into Pollutant Concentration of CWB.
Software and Data• We need values of data for software program.
• an integer or a float: eg. 2016, 6.89
• a character string: eg. “Causeway Bay”
• Retrieve data through “Interface” for data.
• Application Programming Interface (API)
• Computer/Software/Program Readable Data Format.
• Human communicates in a common language, eg. English, Cantonese, Mandarin.
• Data Formats: eg. XML, JSON.
XML Example<developers>
<developer>
<firstName>Richard</firstName> <lastName>Stallman</lastName>
</developer>
<developer>
<firstName>Linus</firstName> <lastName>Torvalds</lastName>
</developer>
<developer>
<firstName>Eric</firstName> <lastName>Raymond</lastName>
</developer>
</developers>
JSON Example
{"developers":[
{"firstName":"Richard", "lastName":"Stallman"},
{"firstName":"Linus", "lastName":"Torvalds"},
{"firstName":"Eric", "lastName":"Raymond"}
]}
Software and Data• Web Scraping: Retrieve documents from website.
• Information Extraction & Transformation:
• Extract and Transform data from common data format into data objects (variables) in software program.
• eg. JSON -> Float(s)
• “Clean Data” is needed for “non-good” data formats.
• eg. HTML -> Float(s)
Software and Data• Programming Language: eg. Python
• Web Scraping library: import scrapy
• JSON library: import json
• Regular Expression library: import re
• Other libraries (eg. database): import mysql
Installing Scrapy
• Scrapy is a web scraping framework written in ptyhon.
• virtualenv ~/env/scrapy
• source ~/env/scrapy/bin/activate
• pip install scrapy
Try in Scrapy Shell• Try in Scrapy Shell:
• scrapy startproject demo1
• scrapy shell http://www.aqhi.gov.hk/epd/ddata/html/out/24aqhi_Eng.xml
• a =response.xpath("//item[contains(.//StationName/text(), 'Causeway Bay’)]/aqhi/text()").extract()
• b = a[len(a)-1] # b is string
• c = int(a[len(a)-1]) # c is integer
• print (b, c) # show the difference
AQHI
• Phase 1: EPD provided AQHI in XML format.
• Phase 2: EPD provided both AQHI and Pollutant Concentration in XML format.
Weather
• Hong Kong Observatory (HKO)
• http://www.weather.gov.hk
• Top Hobbyist Website: Weather Underground http://www.weather.org.hk/
Try in Scrapy
• scrapy shell http://www.weather.gov.hk/wxinfo/ts/text_readings_e.htm
• a = response.xpath(“//pre").extract()[0]
• import re
• b = re.split("\n", a)
Clean Data with REc = ‘’
for i in b:
if re.search("^Sha Tin", i) and c=‘’:
c = re.sub("^Sha Tin *”,"",i)
c = re.sub(" .*”,”",c)
print c # c is string
print float(c) # c is float
Open Data• Discoverable
• Available and Searchable on Internet.
• Structured
• Open and Machine-readable Format.
• Unconditional
• Legal Framework allows to reproduce and repurpose the data.
5-star Open Data DeploymentScheme
• Tim Berners-Lee, the inventor of the Web.
• 5stardata.info
• 1 Star: make your stuff available on the Web (whatever format) under an open license.
• 2 Star: make it available as structured data
• eg. Excel instead of image scan of a table
• 3 Star: use non-proprietary formats
• eg. CSV instead of Excel
• 4 Star: use URIs to denote things, so that people can point at your stuff
• 5 Star: link your data to other data to provide context.
Open Data in Hong Kong• OGCIO
• DATA.ONE in 2011.
• data.gov.hk in 2015.
• JSON/XML, RSS, XLS, CSV, JPEG/PNG,….
• Define workflow for other government department to release open data.
• OGCIO could not decide which data and format can be released
• Decision made by data owner of each government departments.
Open Data in Hong Kong• LegCo
• http://www.legco.gov.hk
• Voting results of LegCo meetings and some committee meetings in XML in Oct 2013.
• API is available in Fall 2014.
• Not part of DATA.ONE / DATA.GOV.HK.
HK Air Quality Data• AQHI, old API and Pollutant Concentration
• XML Data for past 24 hours.
• CSV Data for all past records.
• EPD released AQHI and old API at phase 1 few years ago.
• EPD also released Pollutant Concentration data in machine-readable format at phase 2 one year ago.
Weather in DATA.GOV.HK• I posted a blog 'Progress of Open Government Data in Hong Kong' on 2013/01/17.
• Weather at Data.One released 7 datasets only.
• All datasets are in RSS (XML) format which includes items with title and description only.
• Hourly weather reports, weather forecasts and special reports in 3 languages.
• Examples of missing data:
• Regional Weather Data updates from stations in every 10 minutes.
• One word: Useless.
• RSS Datasets on DATA.GOV.HK is completely different with HKO paid service (XML data feed).
API
• API = Application Programming Interface
• Retrieve data through “Interface”.
• C API, Python API, Objective-C API, Java API……
Web API• API for Web Server or Web Browser/Client.
• Usually Web APIs are used for connecting to 3rd party web services.
• Request and Response messaging interface via Web (HTTP) defined by service providers.
• Request URI example: https://api.twitter.com/1.1/statuses/user_timeline.json
• Data are exchanged in JSON or XML format.
Web API Examples
• Payments: Paypal, MasterPass,…
• Online Services: Google, GitHub,…
• Social Networks: Twitter, Facebook,…
REST• Representational state transfer
• One of reference styles of data exchange for Web 2.0.
• Web API design are usually in REST style.
• Systems communicates using HTTP verbs over HTTP communication.
• HTTP Verbs: GET, POST, PUT, DELETE,…
• GET: list or retrieve data
• POST: create data
• PUT: update or replace data
• DELETE: delete data
Communication Flow with API
• Authorization
• to retrieve a token for your web/mobile/backend apps to use the 3rd party API services.
• Re-direct users to 3rd party services for one-time auth (eg. Username, Password), and token will be used for future access until token is expired.
• For Application or Application-User Authorisation.
• eg. OAuth, XAuth.
• Do your any web API calls.
• API Rate Limits
Tweepy
• a 3rd party twitter library for python.
• pip install tweepy
• http://tweepy.readthedocs.org
Open Data and Web API• Structure of Open Data and Syntax of Web API
will be changed by service / data providers from time to time.
• You should subscribe to developer blog of those API and data services if possible.
• Use existing open source software tools to use web API, otherwise build your own tools (and consider to make it open source)
Open Source Software• Open Source = Source Codes are available to public.
• License: Licensed in one of Open Source Licenses.
• Freedom: Freely (re-)distribute
• You can charge for distribution costs but almost no one will do so.
• GitHub: Rich open source software library
• Git: a distributed version control software tool.
References• Free Software Codes: www.github.com
• Community: opensource.hk
• Conference: 2016.opensource.hk
• 6/24-25 Cyberport
• This Slide: slideshare.com/sammyfung
• Contact: [email protected] / @sammyfung