weather dataset creation using web scraping · the result of the python scripts is a tool that can...
TRANSCRIPT
Gain an understanding of Web Scraping
Pull weather data from www.almanac.com
Write weather data to a file
Transform data to geographic format
Display weather data meaningfully
BeautifulSoup4 – to scrape HTML Code
URLLib2 – to access URLs
GDAL – to handle shapefiles
CSV – to read from csv
ArcPy – to perform ArcGIS functions
Only need to extract key values
This is where BeautifulSoup4 comes in Data can be accessed via unique class
i.e. <div class = “weatherhistory_results_datavalue_temp”>
Most data were duplicates Using MS Excel, this was fixed with the click of a button
This took the data from 1700 values to 83
Due to inconsistent station names and lack of a quality geocoder, I used Google Maps to create a .kml file of the weather stations
This took 20-30 minutes
I chose 9 different weather variables to pull The timespan is easily changed to create larger datasets
The result of the Python scripts is a tool that can be used to collect weather datasets for CA for the last 30 years or so (as much as almanac.com has)
This dataset can be easily manipulated to display it in more meaningful ways
Remember, we started with
this: