purpose: to discover and predict trends of oregon graduates from sou:
DESCRIPTION
Who graduates and who doesn’t – why? Do some majors tend to have more of one gender than another – why? Does economic / cultural background influence choice of major – why? What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others? - PowerPoint PPT PresentationTRANSCRIPT
Purpose: to Discover and Predict
Trends of Oregon Graduates from SOU:
1. Who graduates and who doesn’t – why?
2. Do some majors tend to have more of one gender than another – why?
3. Does economic / cultural background influence choice of major – why?
4. What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others?
5. And finally, can we predict gender from the attributes of their major, age grouping, county, economic status, transfer status, and year graduated?
Data Modeling Tool Used: WEKA
• Classification / Prediction: WEKA decision tree (70% accuracy) predicted gender based on attributes of major, age grouping, year of graduation, county, economic status, transfer student.
• Clustering (108): Visually shows patterns of trends for combinations of attributes
Data
• A 12,982 records of SOU graduates from 1990 to present– 10,491 for training
– 2,491 for testing
• Attributes – PID
– Year graduated {1990 – 2006}
– Transfer student or not {Y,N}
– County {36}
– Economic county status Distressed {1,2,3}
– Age (Discretized into 7 categories)
– Major {1 - 297}
– Gender {F,M}
Attribute: major (297)@attribute MAJOR1_CODE{GSHU,COMS,HPOL,GSHF,GSIN,GSSM,MSBA,SEFR,SEMT,CSIS,MUCN,SEAT,SEMU,SES,SESP,EESP,SEBI,SEIS,SAAS,ABFA,ACA,AFE,ANTH,ANTP,ARLP,ARLT,ART,ARTH,ARTP,BA,BACH,BAHR,HR,BAMG,BAMK,BAMT,BAMU,BANM,BAOM,BAOP,BAPB,BAPH,BAPM,BARM,BASB,BCHP,BED,BIO,BIOH,BIOP,BMTP,BMUP,BOTC,BUSP,CBIS,CCJ,CHAC,CHBA,CHBI,CHEM,CHEP,CHPA,CIM,COJO,CJOP,CMHR,CMM,COMM,COMP,COBR,COTE,CRIM,CRIP,CRM,CS,CSG,CSIA,CSIN,CSMA,CSP,CSPS,ECD,ECEL,ECON,ECOP,ECTL,ED1P,ED2P,ED3P,ED4P,ED5P,ED6P,ED7P,ED8P,ED9P,EDEC,EDMS,EDP,EDUC,EE,EECI,EECT,EEHE,EEEC,EEHC,EEHL,EERE,EESB,EESL,ESP,EESU,EETS,EIAL,ELED,ELMS,EMAT,EMBE,EMBI,EMCH,EMDR,EMFR,EMGE,EMHE,EMIS,EMLA,BAAC,EMMT,EMMU,EMPE,EMPH,EMS,EMSP,EMSS,ENG,ENGL,ENGP,ENGR,ENGW,ES,ESB,ESC,ESG,ESGR,ESSP,FPA,GEGP,GEOG,GEOL,GEOP,GSBE,GSSS,HISP,HIST,HPAT,HPE,HPHP,HPP,HPPE,HPHS,HSP,HUM,INDP,INTD,INTP,INTS,LAFP,LAGP,LANC,LANF,LANG,LANS,LASP,MACS,MAP,MATH,MBA,MECI,MEEC,MERE,MESP,MHAT,MHBE,MHBI,MHCH,MHDR,MHFR,MHGE,MHHE,MHHP,MHIS,MHLA,MHMT,MHMU,MHPE,MHPH,MHS,MHSP,MHSS,MIM,MIMP,MMC,MMST,MSSP,MTAT,MTBE,MTBI,MTCH,MTDR,MTFR,MTGE,MTHE,MTHH,MTHP,MTIS,MTLA,MTMT,MTMU,MTPE,MTPH,MTRE,MTS,MTSE,MTSP,MTSS,MUIN,MUPF,MUS,MUSP,NAAM,NURP,NURS,PCHM,PCJO,PDEM,PDEN,PDHY,PEGR,PHR,PHYA,PHYP,PHYS,PLAW,PMED,PMET,POLP,POLS,POPT,POTH,PPAS,PPHA,PPTH,PRAM,PSY,PSY2,PSY3,PSY4,PSY5,PSY6,PSY7,PSYA,PSYC,PSYP,PVET,SCI,SCIP,SCTL,SEBE,SED,SEHC,SEHE,SEHL,SEHU,SELA,SEPE,SERE,SESB,SESL,SESM,SESS,SESU,SETS,SOC,SOLP,SPAN,SSCD,SSCI,SSCR,SSHS,SSPD,SSPS,TAFA,TBFA,TEAC,THAR,THEA,THEP,UNDL}
Decision Tree splits:majorcountytransferagegraduation yeargender
10,491 Training Set 2491 Training Set
Total = 12,982
Test data DT
Findings:
• Could within 70% accuracy predict F/M for majors (and by following the decision tree you can trace the branching to view the classification of attributes and how they relate)
• But, there were other interesting patterns found using
clustering (especially socio-economic)
To discover socio-economic correlations I added 1 attribute not in original data:
• Added Distressed_County Attribute (economic status)
– 1. Non Distressed– 2. Distressed– 3. Severly Distressed
• And Discretized Age Attribute into 6 Classifications– 1909 – 1939 (67- 97)– 1940 – 1949 (57- 66)– 1950 – 1959 (47- 56)– 1960 – 1969 (37- 46)– 1970 – 1979 (27- 36)– 1980 – 1986 (26 -20)
Oregon countieseconomic health
• http://www.gonorthwest.com/Oregon/Oregon-cities.htm
3 = Severly Distressed(are all rural)
2 = Distressed(except Marion, arenon metro)
1 = Not Distressed
I had based the Distressed Attribute on:
Map of Counties (socio-economic)
1. Red: distressed (rural)2. Yellow: non-metro (except Marion)3. Blue: not distressed
http://www.answers.com/topic/list-of-counties-in-oregon
Distressed counties
County Index Economic Status
Baker 0.64 Severely Distressed
Columbia 0.84 Distressed
Coos 0.77 Severely Distressed
Crook 0.85 Distressed
Curry 0.93 Distressed
Douglas 0.71 Severely Distressed
Gilliam 0.74 Severely Distressed
Grant 0.56 Severely Distressed
Harney 0.31 Severely Distressed
Hood River 0.87 Distressed
Jefferson 0.82 Distressed
Josephine 0.82 Distressed
Klamath 0.73 Severely Distressed
Lake 0.67 Severely Distressed
Lincoln 0.95 Distressed
Linn 0.76 Severely Distressed
Malheur 0.49 Severely Distressed
Marion 0.96 Distressed
Morrow 0.81 Distressed
Sherman 0.65 Severely Distressed
Umatilla 0.63 Severely Distressed
Union 0.79 Severely Distressed
Wallowa 0.77 Severely Distressed
Wasco 0.79 Severely Distressed
Wheeler 0.94 Distressed
County Economic Ranking
RankNational
RankCounty
Per CapitaIncome
Median House-hold Income
1 117 Clackamas County $25,973 $52,080
2 145 Washington County $24,969 $52,122
3 269 Multnomah County $22,606 $41,278
4 335 Benton County $21,868 $41,897
5 348 Deschutes County $21,767 $41,847
6 563 Columbia County $20,078 $45,797
7 634 Lane County $19,681 $36,942
8 670 Clatsop County $19,515 $36,301
9 674 Jackson County $19,498 $36,461
10 715 Polk County $19,282 $42,311
11 773 Tillamook County $19,052 $34,269
12 795 Yamhill County $18,951 $44,111
County Economic Ranking
RankNational
RankCounty
Per CapitaIncome
Median House-hold Income
13 864 Lincoln County $18,692 $32,769
14 958 Marion County $18,408 $40,314
15 1042 Curry County $18,138 $30,117
16 1148 Hood River County $17,877 $38,326
17 1240 Gilliam County $17,659 $33,611
18 1254 Linn County $17,633 $37,518
19 1284 Coos County $17,547 $31,542
20 1324 Sherman County $17,448 $35,142
21 1398 Wallowa County $17,276 $32,129
22 1416 Josephine County $17,234 $31,229
23 1433 Wasco County $17,195 $35,959
RankNational
RankCounty
Per CapitaIncome
Median House-hold Income
24 1591 Union County $16,907 $33,738
25 1594 Crook County $16,899 $35,186
26 1640 Grant County $16,974 $32,560
27 1674 Klamath County $16,719 $31,537
28 1730 Douglas County $16,581 $33,223
29 1804 Umatilla County $16,410 $36,249
30 1922 Harney County $16,159 $30,957
31 1929 Lake County $16,136 $29,506
32 2045 Wheeler County $15,884 $28,750
33 2086 Morrow County $15,802 $37,521
34 2142 Jefferson County $15,675 $35,853
35 2167 Baker County $15,612 $30,367
36 2762 Malheur County $13,895 $30,241
County Economic Ranking
Most interesting finding:
From 1990 to 2006We can see the amount of graduates are far greater from non distressed counties. However the ratio of graduates to non graduates (within each grouping) is extremely disproportionate when you compare groupings. When you compare the ratio of students who graduate (that come from non distressed counties), you see a predominate trend :
Students from distressed, and especially from severely distressed counties, who make it to SOU, Graduate.
Speculating the Reason: Financial Motivation
Education = Increased Income
Non transfer students were the predominent graduates
Jackson, Jefferson, Josephine and Klamath represented transfer studentsIt looks like graduates coming from a distance know they want to attend SOU
right out of high school.
Classified by major and transfer/non transfer:There was no indication of any particular major being the motivation, however our tuition is relatively lower (state) – a possible motivator.
Other Trends that were noted:Male (right)/ Female (left) ratio
is about the same per economic strata
The age groupings by gender are fairly equalGraduates tend to be older students
Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)
Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)
Majors were the first split in the Decision Tree. General trends by clustering could be noted such as Males tended to be ‘sparse’ as English Graduates. Female Graduates were ‘sparse’ in all years within the CS programming track (82% M /17% F). Even in CSIS (79% M, 21% F) with the rest categorized as ‘general CS’ (92%, 8%) for a total of all tracks (81%, 19%)
108 clusters shows clearly the disparity of graduates from certain severely economically distressed counties
Age Groupings and Counties
Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)
Jitter pulled back to show our near neighbors (bottom):Douglas, Jackson, Josephine, Klamath
Age Groupings of Near Neighbor Graduates
Left to right age: 1909 – 1939 (67- 97)1940 – 1949 (57- 66)1950 – 1959 (47- 56)1960 – 1969 (37- 46)1970 – 1979 (27- 36)1980 – 1986 (26 -20)
Near Neighbor is the x axisTransfer yes/no is Y axis
(distressed) Josephine county produced one female CSIN major graduate (in the year 2000) – not definitive as I was clicking on instances (to see what I could find) and could
have missed another female from this county.
Gender is 50/50 for Near Neighbor Graduates
Listing by County, Number of Graduates#Max-min number of graduates: 4854 Jackson (near neighbor), 581 Multnomah (Distant)
(Wheeler 0, Gilliam 2 distant), Lake 39, Harney 25, Grant 9, Malheur 22, Crook 20