purpose: to discover and predict trends of oregon graduates from sou:

27
Purpose: to Discover and Predict Trends of Oregon Graduates from SOU: 1. Who graduates and who doesn’t – why? 2. Do some majors tend to have more of one gender than another – why? 3. Does economic / cultural background influence choice of major – why? 4. What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others? 5. And finally, can we predict gender from the attributes of their major, age grouping, county, economic

Upload: astro

Post on 08-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Who graduates and who doesn’t – why? Do some majors tend to have more of one gender than another – why? Does economic / cultural background influence choice of major – why? What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Purpose: to Discover and Predict

Trends of Oregon Graduates from SOU:

1. Who graduates and who doesn’t – why?

2. Do some majors tend to have more of one gender than another – why?

3. Does economic / cultural background influence choice of major – why?

4. What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others?

5. And finally, can we predict gender from the attributes of their major, age grouping, county, economic status, transfer status, and year graduated?

Page 2: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Data Modeling Tool Used: WEKA

• Classification / Prediction: WEKA decision tree (70% accuracy) predicted gender based on attributes of major, age grouping, year of graduation, county, economic status, transfer student.

• Clustering (108): Visually shows patterns of trends for combinations of attributes

Page 3: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Data

• A 12,982 records of SOU graduates from 1990 to present– 10,491 for training

– 2,491 for testing

• Attributes – PID

– Year graduated {1990 – 2006}

– Transfer student or not {Y,N}

– County {36}

– Economic county status Distressed {1,2,3}

– Age (Discretized into 7 categories)

– Major {1 - 297}

– Gender {F,M}

Page 4: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Attribute: major (297)@attribute MAJOR1_CODE{GSHU,COMS,HPOL,GSHF,GSIN,GSSM,MSBA,SEFR,SEMT,CSIS,MUCN,SEAT,SEMU,SES,SESP,EESP,SEBI,SEIS,SAAS,ABFA,ACA,AFE,ANTH,ANTP,ARLP,ARLT,ART,ARTH,ARTP,BA,BACH,BAHR,HR,BAMG,BAMK,BAMT,BAMU,BANM,BAOM,BAOP,BAPB,BAPH,BAPM,BARM,BASB,BCHP,BED,BIO,BIOH,BIOP,BMTP,BMUP,BOTC,BUSP,CBIS,CCJ,CHAC,CHBA,CHBI,CHEM,CHEP,CHPA,CIM,COJO,CJOP,CMHR,CMM,COMM,COMP,COBR,COTE,CRIM,CRIP,CRM,CS,CSG,CSIA,CSIN,CSMA,CSP,CSPS,ECD,ECEL,ECON,ECOP,ECTL,ED1P,ED2P,ED3P,ED4P,ED5P,ED6P,ED7P,ED8P,ED9P,EDEC,EDMS,EDP,EDUC,EE,EECI,EECT,EEHE,EEEC,EEHC,EEHL,EERE,EESB,EESL,ESP,EESU,EETS,EIAL,ELED,ELMS,EMAT,EMBE,EMBI,EMCH,EMDR,EMFR,EMGE,EMHE,EMIS,EMLA,BAAC,EMMT,EMMU,EMPE,EMPH,EMS,EMSP,EMSS,ENG,ENGL,ENGP,ENGR,ENGW,ES,ESB,ESC,ESG,ESGR,ESSP,FPA,GEGP,GEOG,GEOL,GEOP,GSBE,GSSS,HISP,HIST,HPAT,HPE,HPHP,HPP,HPPE,HPHS,HSP,HUM,INDP,INTD,INTP,INTS,LAFP,LAGP,LANC,LANF,LANG,LANS,LASP,MACS,MAP,MATH,MBA,MECI,MEEC,MERE,MESP,MHAT,MHBE,MHBI,MHCH,MHDR,MHFR,MHGE,MHHE,MHHP,MHIS,MHLA,MHMT,MHMU,MHPE,MHPH,MHS,MHSP,MHSS,MIM,MIMP,MMC,MMST,MSSP,MTAT,MTBE,MTBI,MTCH,MTDR,MTFR,MTGE,MTHE,MTHH,MTHP,MTIS,MTLA,MTMT,MTMU,MTPE,MTPH,MTRE,MTS,MTSE,MTSP,MTSS,MUIN,MUPF,MUS,MUSP,NAAM,NURP,NURS,PCHM,PCJO,PDEM,PDEN,PDHY,PEGR,PHR,PHYA,PHYP,PHYS,PLAW,PMED,PMET,POLP,POLS,POPT,POTH,PPAS,PPHA,PPTH,PRAM,PSY,PSY2,PSY3,PSY4,PSY5,PSY6,PSY7,PSYA,PSYC,PSYP,PVET,SCI,SCIP,SCTL,SEBE,SED,SEHC,SEHE,SEHL,SEHU,SELA,SEPE,SERE,SESB,SESL,SESM,SESS,SESU,SETS,SOC,SOLP,SPAN,SSCD,SSCI,SSCR,SSHS,SSPD,SSPS,TAFA,TBFA,TEAC,THAR,THEA,THEP,UNDL}

Decision Tree splits:majorcountytransferagegraduation yeargender

10,491 Training Set 2491 Training Set

Total = 12,982

Page 5: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Test data DT

Page 6: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Findings:

• Could within 70% accuracy predict F/M for majors (and by following the decision tree you can trace the branching to view the classification of attributes and how they relate)

• But, there were other interesting patterns found using

clustering (especially socio-economic)

Page 7: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

To discover socio-economic correlations I added 1 attribute not in original data:

• Added Distressed_County Attribute (economic status)

– 1. Non Distressed– 2. Distressed– 3. Severly Distressed

• And Discretized Age Attribute into 6 Classifications– 1909 – 1939 (67- 97)– 1940 – 1949 (57- 66)– 1950 – 1959 (47- 56)– 1960 – 1969 (37- 46)– 1970 – 1979 (27- 36)– 1980 – 1986 (26 -20)

Page 8: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Oregon countieseconomic health

• http://www.gonorthwest.com/Oregon/Oregon-cities.htm

3 = Severly Distressed(are all rural)

2 = Distressed(except Marion, arenon metro)

1 = Not Distressed

I had based the Distressed Attribute on:

Page 9: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Map of Counties (socio-economic)

1. Red: distressed (rural)2. Yellow: non-metro (except Marion)3. Blue: not distressed

http://www.answers.com/topic/list-of-counties-in-oregon

Distressed counties

County Index Economic Status

Baker 0.64 Severely Distressed

Columbia 0.84 Distressed

Coos 0.77 Severely Distressed

Crook 0.85 Distressed

Curry 0.93 Distressed

Douglas 0.71 Severely Distressed

Gilliam 0.74 Severely Distressed

Grant 0.56 Severely Distressed

Harney 0.31 Severely Distressed

Hood River 0.87 Distressed

Jefferson 0.82 Distressed

Josephine 0.82 Distressed

Klamath 0.73 Severely Distressed

Lake 0.67 Severely Distressed

Lincoln 0.95 Distressed

Linn 0.76 Severely Distressed

Malheur 0.49 Severely Distressed

Marion 0.96 Distressed

Morrow 0.81 Distressed

Sherman 0.65 Severely Distressed

Umatilla 0.63 Severely Distressed

Union 0.79 Severely Distressed

Wallowa 0.77 Severely Distressed

Wasco 0.79 Severely Distressed

Wheeler 0.94 Distressed

Page 10: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

County Economic Ranking

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

1 117 Clackamas County $25,973 $52,080

2 145 Washington County $24,969 $52,122

3 269 Multnomah County $22,606 $41,278

4 335 Benton County $21,868 $41,897

5 348 Deschutes County $21,767 $41,847

6 563 Columbia County $20,078 $45,797

7 634 Lane County $19,681 $36,942

8 670 Clatsop County $19,515 $36,301

9 674 Jackson County $19,498 $36,461

10 715 Polk County $19,282 $42,311

11 773 Tillamook County $19,052 $34,269

12 795 Yamhill County $18,951 $44,111

Page 11: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

County Economic Ranking

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

13 864 Lincoln County $18,692 $32,769

14 958 Marion County $18,408 $40,314

15 1042 Curry County $18,138 $30,117

16 1148 Hood River County $17,877 $38,326

17 1240 Gilliam County $17,659 $33,611

18 1254 Linn County $17,633 $37,518

19 1284 Coos County $17,547 $31,542

20 1324 Sherman County $17,448 $35,142

21 1398 Wallowa County $17,276 $32,129

22 1416 Josephine County $17,234 $31,229

23 1433 Wasco County $17,195 $35,959

Page 12: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

24 1591 Union County $16,907 $33,738

25 1594 Crook County $16,899 $35,186

26 1640 Grant County $16,974 $32,560

27 1674 Klamath County $16,719 $31,537

28 1730 Douglas County $16,581 $33,223

29 1804 Umatilla County $16,410 $36,249

30 1922 Harney County $16,159 $30,957

31 1929 Lake County $16,136 $29,506

32 2045 Wheeler County $15,884 $28,750

33 2086 Morrow County $15,802 $37,521

34 2142 Jefferson County $15,675 $35,853

35 2167 Baker County $15,612 $30,367

36 2762 Malheur County $13,895 $30,241

County Economic Ranking

Page 13: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Most interesting finding:

From 1990 to 2006We can see the amount of graduates are far greater from non distressed counties. However the ratio of graduates to non graduates (within each grouping) is extremely disproportionate when you compare groupings. When you compare the ratio of students who graduate (that come from non distressed counties), you see a predominate trend :

Students from distressed, and especially from severely distressed counties, who make it to SOU, Graduate.

Page 14: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Speculating the Reason: Financial Motivation

Education = Increased Income

Page 15: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Non transfer students were the predominent graduates

Jackson, Jefferson, Josephine and Klamath represented transfer studentsIt looks like graduates coming from a distance know they want to attend SOU

right out of high school.

Page 16: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Classified by major and transfer/non transfer:There was no indication of any particular major being the motivation, however our tuition is relatively lower (state) – a possible motivator.

Page 17: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Other Trends that were noted:Male (right)/ Female (left) ratio

is about the same per economic strata

Page 18: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

The age groupings by gender are fairly equalGraduates tend to be older students

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Page 19: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Majors were the first split in the Decision Tree. General trends by clustering could be noted such as Males tended to be ‘sparse’ as English Graduates. Female Graduates were ‘sparse’ in all years within the CS programming track (82% M /17% F). Even in CSIS (79% M, 21% F) with the rest categorized as ‘general CS’ (92%, 8%) for a total of all tracks (81%, 19%)

Page 20: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

108 clusters shows clearly the disparity of graduates from certain severely economically distressed counties

Page 21: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Age Groupings and Counties

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Page 22: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Jitter pulled back to show our near neighbors (bottom):Douglas, Jackson, Josephine, Klamath

Page 23: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Age Groupings of Near Neighbor Graduates

Left to right age: 1909 – 1939 (67- 97)1940 – 1949 (57- 66)1950 – 1959 (47- 56)1960 – 1969 (37- 46)1970 – 1979 (27- 36)1980 – 1986 (26 -20)

Page 24: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Near Neighbor is the x axisTransfer yes/no is Y axis

Page 25: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

(distressed) Josephine county produced one female CSIN major graduate (in the year 2000) – not definitive as I was clicking on instances (to see what I could find) and could

have missed another female from this county.

Page 26: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Gender is 50/50 for Near Neighbor Graduates

Page 27: Purpose: to Discover and Predict  Trends of Oregon Graduates from SOU:

Listing by County, Number of Graduates#Max-min number of graduates: 4854 Jackson (near neighbor), 581 Multnomah (Distant)

(Wheeler 0, Gilliam 2 distant), Lake 39, Harney 25, Grant 9, Malheur 22, Crook 20