data literacy training - case of ca election 70
DESCRIPTION
This hands-on training was given to the journalists in Dec 2013 under OpenNepal banner.TRANSCRIPT
Hands-on TrainingData – what and how?A case of CA Election 70
YoungInnovationsOpenNepal
Data → Story
● Find data● Wrangle/Cleanup the data● Merge data with others (if any)● Filter and sort the data● Analyze data● Visualize data (story)
CA Election 2070
● What is data?– The candidates (age, gender, party)
– The constituencies (vdc, ward, party)
– The results (with votes, winner)
– …..
Where to find it?
● http://election.gov.np● The following FPTP results data in XML
Not lucky every time finding data
● Scrapping (requires programming knowledge)– Using google scraper
● PDF conversion● PDF manual transcribe
Chrome Scraper Extension
● Search for “Chrome extension Scraper” from Chrome browser to install
Scraper in Action
PDF to Text
● Online tools available● Linux has different set of utilities● PDF is still a big nuisance (though something is
better than nothing)
PDF to Text
http://www.election.gov.np/election/uploads/files/ecn_report/constwisecandidatecount.pdf
PDF to Text
● Linux utility - pdftotext
CSV
● CSV - Comma Separated Value● Opens in MS Excel, Open Office, Google
Spreadsheet● Easy to work with
CA XML Data to CSV
XML to CSV?
● Online services are available● Might need help from technologist● In linux (there might be several ways, e.g)
xml2 < FPTP-CA70.xml | 2csv FPTP DISTNAME CONST CANDIDATE AGE SEX PARTYNAME SYMBOLNAME TOTALVOTE STATUS > FPTP-CA70.csv
OpenNepal
● Repository of datasets– data in csv, xml or json format
● Request for dataset● Request for help in conversion from one format
to another, scrapping data, ...● OpenNepal Community (GoogleGroup) is very
vibrant
CA Results CSV data
● Converted from XML
http://dev.yipl.com.np/data-training/data/FPTP-CA70.csv
Processing/Cleaning CSV – Basics
● Add header● Sorting (by different fields)● Filter● Simple formulas
Add headers
● Insert row at the top ● Add header for each column
Sorting
● Sorting by Age – Ascending, Descending● Find out youngest winning candidate age
Filtering
● Filter the list of winning female candidates
Some exercise
● Are there people who didn't receive a single vote?● What is the highest and lowest number of votes of
candidate who didn't win?● Find the percentage of female and male
candidates, percentage of winning female candidates?
● Try the above exercise in one district of your interest?
● Think of other things you can do with this basic skills
More questions
● How many parties have candidates in all 240 constituencies?
● How many male and female candidates are there in Nepali Congress? Ratio of male-female in far-west districts?
● Which party has the highest number of female candidates?
Data Processing - Pivottable
PivotTable - more
● Breakdown of
independent
candidates
Lets again see numbers
● Sorted by total
number of
candidates
Visualization
● Bar graph of male-female candidates of top few districts
What else visualizations are possible?
● https://github.com/mbostock/d3/wiki/Gallery
What else visualizations are possible?
● https://github.com/mbostock/d3/wiki/Gallery
Geocoding
● Geo-coding– the conversion of a human-readable location name
into a numeric (or other machine-processable) location such as a longitude and latitude
– Kathmandu => [geocoding] => {latitude: 27.70169, longitude: 85.3206}
● Online tools available for geocoding– Google fusion table
– cartodb
Lat-long in maps.google.com
● Put the lat long (27.70169 85.3206) in google map search box
Services available for geocoding
http://open.mapquestapi.com/nominatim/v1/search?format=xml&q=Kathmandu,Nepal
Problems with this CSV
● Unicode in districts name● Can't geocode (currently only english)
Adding english district name
http://dev.yipl.com.np/data-training/data/FPTP-CA70-eng.csv
Google Fusion Table
● tables.googlelabs.com (need @gmail account)
Imported data
Geocoding
Using filter in the map
Use of heatmap based on votes
Thank you