weka: a useful tool for air quality forecasting
DESCRIPTION
TRANSCRIPT
![Page 1: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/1.jpg)
Weka: A Useful Tool for Air Quality Forecasting
William F. Ryan
Department of Meteorology
The Pennsylvania State University
2007 National Air Quality Conference, Orlando
![Page 2: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/2.jpg)
Weka
The weka, or woodhen, is a birdnative to New Zealand. Weka is
also the name of a suite of machinelearning software tools, written in
Java, and developed at the Universityof Wiakato in New Zealand.
http://www.cs.waikato.ac.nz/ml/weka
![Page 3: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/3.jpg)
Machine Learning
• Machine learning is a subfield of artificial intelligence (AI) concerned with the development of algorithms and techniques that allow computers to "learn".
• The machine learning algorithms in Weka include, among others, linear regression, classification trees, clustering and artificial neural networks (ANN).
![Page 4: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/4.jpg)
Weka Can Be A Useful Tool
• Weka has the potential to be a useful tool to support local air quality forecasting efforts – particularly those operating on a limited budget. – Weka is open source (free) software - although the
purchase of the associated text book is strongly recommended.
– Weka is easily installed on standard PC's but can also run on Linux and other platforms.
– Only minimal modifications are necessary to prepare data files for use in Weka.
– The user interface is simple and intuitive.
![Page 5: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/5.jpg)
Weka and PM2.5 Forecasting
• Of particular interest to air quality forecasters is the wide range of algorithms included in Weka.
• These algorithms may be useful to address shortcomings in statistical forecast guidance for fine particulate matter (PM2.5).
• Simple linear regression methods provide reasonable skill for O3 forecasting, due to the very strong and nearly linear ozone-temperature relationship, but linear regression methods have shown limited skill in forecasting PM2.5.
![Page 6: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/6.jpg)
PM2.5 Forecasting
O3 (left panel) is well-behavedstatistically. Distribution is nearnormal with a strong associationwith maximum temperature. As a
result, linear techniques areuseful.
PM2.5 (right panel) is not well-behaved. Distribution is skewed,
no strong association with anyparticular weather variable.
Tools included in Weka, including ANN and classification
and regression trees (CART), are capable of addressing
non-linear problems posed by PM2.5.
![Page 7: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/7.jpg)
Weka: Information
http://www.cs.waikato.ac.nz/ml/weka/
![Page 8: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/8.jpg)
Input File Format
Weka uses its ownfile format called: *.aarf
All you need to dothough is provide a*.csv file with variablenames in the first lineand Weka will convert
![Page 9: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/9.jpg)
aarf Format
aarf format is simple anyway:
ASCII fileList of variable and type
Then data follows, comma separated
Missing data marked as “?”
![Page 10: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/10.jpg)
Data Editing
Data can be easily editedwithin Weka itself
![Page 11: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/11.jpg)
Analyzing Data
Variables can be easilyscanned with basic
statistics and histogramsprovided by Weka
![Page 12: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/12.jpg)
Quick Analysis Tools
![Page 13: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/13.jpg)
Sampling and Test Data Set Options
![Page 14: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/14.jpg)
Functions Available
WEKA includes a number of different techniques that can be useful for forecast development.
These include:
Linear and logistic regressionPerceptron models (Neural networks)
![Page 15: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/15.jpg)
Linear Regression
Unfortunately, the “work horse” linearregression module in Weka is limited inusefulness:
-No automatic stepwise function-Poor diagnostics
Compare: SYSTAT, Minitab
![Page 16: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/16.jpg)
Classification and Regression Trees (CART)
A variety of classificationalgorithms are available.
Standard algorithm isJ48, which is a souped up version of the lastfree version of CART(Version 4.5)
Commercial version iscurrently 5.0.
![Page 17: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/17.jpg)
CART Options
A number of optionsare available tofine tune the CARTAnalysis:
-Minimum # of cases per node-Types of pruning: e.g., sub-tree raising-Confidence values for splitting nodes
![Page 18: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/18.jpg)
CART Diagnostics
CART is notorious for usingCPU resources but the WEKAversion runs efficiently on mystandard PC.
Diagnostics are better forCART than linear regression.
Example on left is of a 4 categoryPM2.5 CART forecast.
![Page 19: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/19.jpg)
CART Visualization
![Page 20: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/20.jpg)
Artificial Neural Networks (ANN)
“Linear Regression by a mob”
Produces forecast bytaking the weightedsum of predictors andthen layering the process.
![Page 21: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/21.jpg)
Artificial Neural Networks - Summary
Known samples (historical data) are used to “train” the network.
Input data (xi) are assigned weights (wi) and combined in the “hidden” layer – like a set of linearregressions. These sets are then combined in additional layers – like regressions of regressions.
The sum of data and weights are transformed(“squashed”) to the range of the training data and error is measured.
A supervised training algorithm uses output error to adjust network weights to minimize errors.
![Page 22: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/22.jpg)
Artificial Neural Networks – Pros/Cons
• Pro: ANN’s are a powerful technique utilized across scientific disciplines.
• Pro: Theoretically well suited to non-linear processes like air quality.
• Con: Not transparent to users. Hard to integrate into forecast thinking.
• Con: Technically difficult to understand, raises risk of misuse.
![Page 23: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/23.jpg)
Example: Neural Network Structure
www.doc.ic.ac.uk/~sgc/teaching/v231/
![Page 24: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/24.jpg)
WEKA Neural Networks
WEKA provides user controlof training parameters:
# of iterations or epochs (“training time”)
Increment of weight adjustments in back propogation (“learning rate”) Controls on varying changes to increments (“momentum”)
![Page 25: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/25.jpg)
Conclusions
• Weka is a low-cost forecasting tool that has the potential to be a useful for air quality forecasting – particularly in situations where non-linear effects dominate.
• Some Weka modules are not fully developed for forecast algorithm development.
• Patience, use of textbook and Weka listserv are required to get the most out of the program.
![Page 26: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/26.jpg)
URLs of Interest
• Weka:– http://www.cs.waikato.ac.nz/ml/weka
• Mailing List: – https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
• Mailing List Archives– https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/
• Informal FAQ:– http://www.public.asu.edu/~sksinghi/weka-faq.html
![Page 27: WEKA: A Useful Tool for Air Quality Forecasting](https://reader031.vdocument.in/reader031/viewer/2022020207/54c63cc54a7959ed7f8b4570/html5/thumbnails/27.jpg)
Acknowledgements
• The Delaware Valley Regional Planning Commission (DVRPC) – Mike Boyer and Sean Greene – and the member states (PA, DE and NJ) for supporting air quality forecast development.
• Dr. George Young of Penn State for his advice, patience and teaching skill.