Motivation
Dataset
Project Scope
Data Warehousing
Data Mining
Conclusion
Learning Experience
References
Research 1:
Public school funding is the largest program in the state budget, receiving more than 40 % of the state's General Fund resources. The 2014–15 state budget includes more than $45 billion in General Fund resources.
- California Department of Education
Research 2:
Since 1980, higher education spending has decreased by 13 percent in inflation adjusted dollars, whereas spending on California’s prisons and associated correctional programs has skyrocketed by 436 percent.
- California HuffingtonPost
CA State Government financial data reported by counties, cities and districts with more than billions of records in each files .
This dataset has details of Expenditures, Revenues and State Income of all the departments generated in the form of fees, penalties and taxes.
https://bythenumbers.sco.ca.gov/
To give important financial information on government’s funding and income based on distinct regions and departments.
Target user ? Citizens, Tax payers, Students
Businesses, Non-profit organizations
What is the State Income based on County, City and District?
Which Business categories and Sub-Departments are responsible for the maximum income collection?
Determine the expenditures for a particular department.
How much has your county spent on public safety in the past 4 years?
Original data: three different files in .csv format
Handled missing values and listed required attributes for our project
Data integration and data reduction to relevant records
To provide relation and association among three different datasets, we created extra attributes and identifier
Load tables in MySQL database
Large Dataset with 20 billions of records approximately
Departments with invalid and blank values were eliminated to maintain consistency in our record
Removed least required sub-categories. Primary focus was on income and expenditure
Demo !!!
Link to our Data Mart : http://athena.ecs.csus.edu/~appanap/
Q. A star schema has what type of relationship between a dimension and fact table?
a) Many to many
b) One to one
c) One to many
d) All of the above
Classification algorithm is used to classify the counties and departments into Loss and Profit classes
Prediction for deciding value ranges for year 2014
Used the combined 3 datasets from our DataMart application
Maintained required attributes for mining in CSV format
Converted numeric column to nominal i.e. values to ranges
Challenges Algorithms selection for dataset Large dataset Creating classes for important categories
Classification Tree: J48
Tool: Weka, Tableau
Used data of 2010 to 2013 as training data
Data of year 2014 is used as Test data to check prediction accuracy
As we can see the prediction is almost accurate.The Predicted class of 2014 has approximate same values and range compared to above actual values.
Using tableau we were able to visualize and research for the following:
what are the top 5 revenue and expenditure generating counties?
Financial data comparison in between Sacramento and Los Angeles
Funds distribution for flood control Government expenditures on public facilities like roads,
parking facilities etc. Government expenses on public health
Comparison on government’s expenses between prisons and education
Q. Which of the following is not a data mining functionality?
a) Characterization and Discrimination
b) Classification and regressionc) Selection and interpretationd) Clustering and Analysis
Data Warehouse design:
PHP, HTML/CSS, JavaScript
Database:
MySQL
Data Mining tools:
WEKA
Data Visualization:
Tableau
Learned designing of data mart application
Learned different data mining tools like Weka, Rapid miner and Tableau
Learned the practical usage of various classification algorithms like J48, Naïve Bayesian, correlation matrix
Team work and brainstorming really helped us to resolve issues in execution of our project
California States Controller’s Office , Government Financial Reports,
https://bythenumbers.sco.ca.gov/browse?utf8=%E2%9C%93&page=1
California Department of Education:
http://www.cde.ca.gov/fg/fr/eb/
California Drought
http://drought.ca.gov/topstory/top-story-58.html
California Spending More On Prisons Than Colleges, Report
http://www.huffingtonpost.com/2012/09/06/california-prisons-colleges_n_1863101.html