embulk makes japan visible

18
Embulk makes Japan visible Kai Sasaki Treasure Data Inc.

Upload: kai-sasaki

Post on 21-Apr-2017

3.565 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Embulk makes Japan visible

Kai Sasaki Treasure Data Inc.

Who am I?

• Kai Sasaki (@Lewuathe)

• Treasure Data Inc

• Maintaining and improvingHadoop infrastructure

• Hadoop, Spark contributor

Topic• What is Embulk?

• Embulk ☓ GeoJSON

• DATA.GO.JP (http://www.data.go.jp/)

• DEMO

• Conclusion

What is Embulk?• Parallel bulk data loader

• using plugins

• to make data integration relaxed

http://www.embulk.org/docs/

http://www.slideshare.net/frsyuki/embulk-56197273/4

Plugins

http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12

Plugins

http://www.embulk.org/plugins/

Embulk ☓ GeoJSON• GeoJSON is a format for encoding geographic

data structures

{ “type”: “FeatureCollection”, “features”: [ { “type”: “Feature”, “geometry”: { “type”: “Point”, “coordinates”: [37.0, 128.4] }, “properties”: { “name”: “Point A” } } ] }

Embulk ☓ GeoJSON

https://github.com/benbalter/dc-wifi-social/blob/master/bars.geojson

Embulk ☓ GeoJSON

• embulk-formatter-geojsonhttps://rubygems.org/gems/embulk-formatter-geojson

• Convert any type of source data (csv, tsv, json msgpack etc) supported by input plugin into GeoJSON format.

$ embulk new ruby-formatter …

Embulk ☓ GeoJSON

id,name,population,…1,Tokyo,1000,…2,Osaka,800,…

template.geojson

{ “id”: 1, “properties”: { “name”: “Tokyo”, “population”: 1000 }, “geometry”: <From template.geojson> }

embulk-formatter-geojson$ embulk gem install embulk-formatter-geojson $ cat config.yml … out: type: file formatter: type: geojson template_file: /path/to/template.geojson identifier: “id" … $ embulk run config.yml

DATA.GO.JP

http://www.data.go.jp/

DEMO

http://www.lewuathe.com/opendata/

d3.json(url, function(error, geoJp) { svg.selectAll("path") .data(geoJp.features) .enter().append(“path") .on("mouseover", function(d) { $("#description").text(d.properties["name"]); }) .attr("class", function(d) { return d.id; }) .attr("d", geopath) .attr("fill", function(d) { var prop = d.properties[“population”]; return colors[prop]; }); });

• d3.js (https://d3js.org/)

d3.json(url, function(error, geoJp) { svg.selectAll("path") .data(geoJp.features) .enter().append(“path") .on("mouseover", function(d) { $("#description").text(d.properties["name"]); }) .attr("class", function(d) { return d.id; }) .attr("d", geopath) .attr("fill", function(d) { var prop = d.properties[“population”]; return colors[prop]; }); });

• d3.js (https://d3js.org/)Embedded Properties

Conclusion

• Embulk can be yet another format converter

• GeoJSON as a container including data and topology

• DATA.GO.JP provides various type of open data

Thank you!