data literacy training - case of ca election 70

38
Hands-on Training Data – what and how? A case of CA Election 70 YoungInnovations OpenNepal

Upload: anjesh-tuladhar

Post on 09-May-2015

473 views

Category:

Technology


0 download

DESCRIPTION

This hands-on training was given to the journalists in Dec 2013 under OpenNepal banner.

TRANSCRIPT

Page 1: Data Literacy Training - case of CA Election 70

Hands-on TrainingData – what and how?A case of CA Election 70

YoungInnovationsOpenNepal

Page 2: Data Literacy Training - case of CA Election 70

Data → Story

● Find data● Wrangle/Cleanup the data● Merge data with others (if any)● Filter and sort the data● Analyze data● Visualize data (story)

Page 3: Data Literacy Training - case of CA Election 70

CA Election 2070

● What is data?– The candidates (age, gender, party)

– The constituencies (vdc, ward, party)

– The results (with votes, winner)

– …..

Page 4: Data Literacy Training - case of CA Election 70

Where to find it?

● http://election.gov.np● The following FPTP results data in XML

Page 5: Data Literacy Training - case of CA Election 70

Not lucky every time finding data

● Scrapping (requires programming knowledge)– Using google scraper

● PDF conversion● PDF manual transcribe

Page 6: Data Literacy Training - case of CA Election 70

Chrome Scraper Extension

● Search for “Chrome extension Scraper” from Chrome browser to install

Page 7: Data Literacy Training - case of CA Election 70

Scraper in Action

Page 8: Data Literacy Training - case of CA Election 70

PDF to Text

● Online tools available● Linux has different set of utilities● PDF is still a big nuisance (though something is

better than nothing)

Page 10: Data Literacy Training - case of CA Election 70

PDF to Text

● Linux utility - pdftotext

Page 11: Data Literacy Training - case of CA Election 70

CSV

● CSV - Comma Separated Value● Opens in MS Excel, Open Office, Google

Spreadsheet● Easy to work with

Page 12: Data Literacy Training - case of CA Election 70

CA XML Data to CSV

Page 13: Data Literacy Training - case of CA Election 70

XML to CSV?

● Online services are available● Might need help from technologist● In linux (there might be several ways, e.g)

xml2 < FPTP-CA70.xml | 2csv FPTP DISTNAME CONST CANDIDATE AGE SEX PARTYNAME SYMBOLNAME TOTALVOTE STATUS > FPTP-CA70.csv

Page 14: Data Literacy Training - case of CA Election 70

OpenNepal

● Repository of datasets– data in csv, xml or json format

● Request for dataset● Request for help in conversion from one format

to another, scrapping data, ...● OpenNepal Community (GoogleGroup) is very

vibrant

Page 15: Data Literacy Training - case of CA Election 70

CA Results CSV data

● Converted from XML

http://dev.yipl.com.np/data-training/data/FPTP-CA70.csv

Page 16: Data Literacy Training - case of CA Election 70

Processing/Cleaning CSV – Basics

● Add header● Sorting (by different fields)● Filter● Simple formulas

Page 17: Data Literacy Training - case of CA Election 70

Add headers

● Insert row at the top ● Add header for each column

Page 18: Data Literacy Training - case of CA Election 70

Sorting

● Sorting by Age – Ascending, Descending● Find out youngest winning candidate age

Page 19: Data Literacy Training - case of CA Election 70

Filtering

● Filter the list of winning female candidates

Page 20: Data Literacy Training - case of CA Election 70

Some exercise

● Are there people who didn't receive a single vote?● What is the highest and lowest number of votes of

candidate who didn't win?● Find the percentage of female and male

candidates, percentage of winning female candidates?

● Try the above exercise in one district of your interest?

● Think of other things you can do with this basic skills

Page 21: Data Literacy Training - case of CA Election 70

More questions

● How many parties have candidates in all 240 constituencies?

● How many male and female candidates are there in Nepali Congress? Ratio of male-female in far-west districts?

● Which party has the highest number of female candidates?

Page 22: Data Literacy Training - case of CA Election 70

Data Processing - Pivottable

Page 23: Data Literacy Training - case of CA Election 70

PivotTable - more

● Breakdown of

independent

candidates

Page 24: Data Literacy Training - case of CA Election 70

Lets again see numbers

● Sorted by total

number of

candidates

Page 25: Data Literacy Training - case of CA Election 70

Visualization

● Bar graph of male-female candidates of top few districts

Page 26: Data Literacy Training - case of CA Election 70

What else visualizations are possible?

● https://github.com/mbostock/d3/wiki/Gallery

Page 27: Data Literacy Training - case of CA Election 70

What else visualizations are possible?

● https://github.com/mbostock/d3/wiki/Gallery

Page 28: Data Literacy Training - case of CA Election 70

Geocoding

● Geo-coding– the conversion of a human-readable location name

into a numeric (or other machine-processable) location such as a longitude and latitude

– Kathmandu => [geocoding] => {latitude: 27.70169, longitude: 85.3206}

● Online tools available for geocoding– Google fusion table

– cartodb

Page 29: Data Literacy Training - case of CA Election 70

Lat-long in maps.google.com

● Put the lat long (27.70169 85.3206) in google map search box

Page 30: Data Literacy Training - case of CA Election 70

Services available for geocoding

http://open.mapquestapi.com/nominatim/v1/search?format=xml&q=Kathmandu,Nepal

Page 31: Data Literacy Training - case of CA Election 70

Problems with this CSV

● Unicode in districts name● Can't geocode (currently only english)

Page 32: Data Literacy Training - case of CA Election 70

Adding english district name

http://dev.yipl.com.np/data-training/data/FPTP-CA70-eng.csv

Page 33: Data Literacy Training - case of CA Election 70

Google Fusion Table

● tables.googlelabs.com (need @gmail account)

Page 34: Data Literacy Training - case of CA Election 70

Imported data

Page 35: Data Literacy Training - case of CA Election 70

Geocoding

Page 36: Data Literacy Training - case of CA Election 70

Using filter in the map

Page 37: Data Literacy Training - case of CA Election 70

Use of heatmap based on votes

Page 38: Data Literacy Training - case of CA Election 70

Thank you