weather dataset creation using web scraping · the result of the python scripts is a tool that can...

15
Justin McMillan

Upload: nguyenxuyen

Post on 19-May-2018

225 views

Category:

Documents


1 download

TRANSCRIPT

Justin McMillan

Gain an understanding of Web Scraping

Pull weather data from www.almanac.com

Write weather data to a file

Transform data to geographic format

Display weather data meaningfully

BeautifulSoup4 – to scrape HTML Code

URLLib2 – to access URLs

GDAL – to handle shapefiles

CSV – to read from csv

ArcPy – to perform ArcGIS functions

URL format requires use of Zip Code & Date

Obtained CAZip.shp from ArcGIS Online

View on almanac.com Only use is viewing a single location on a single day

HTML Source Code:

Only need to extract key values

This is where BeautifulSoup4 comes in Data can be accessed via unique class

i.e. <div class = “weatherhistory_results_datavalue_temp”>

Code: pulling data for a single day Took about 30 minutes

Over 1700 zip codes

Most data were duplicates Using MS Excel, this was fixed with the click of a button

This took the data from 1700 values to 83

Due to inconsistent station names and lack of a quality geocoder, I used Google Maps to create a .kml file of the weather stations

This took 20-30 minutes

The .kml was easily converted in ArcMap

Next I used a nested loop to determine dates to scrape data for:

I chose 9 different weather variables to pull The timespan is easily changed to create larger datasets

Using ArcPy, I simply joined the .csv file to the .shp file I created earlier:

The result of the Python scripts is a tool that can be used to collect weather datasets for CA for the last 30 years or so (as much as almanac.com has)

This dataset can be easily manipulated to display it in more meaningful ways

Remember, we started with

this: