purpose: to discover and predict trends of oregon graduates from sou:

Post on 08-Jan-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Who graduates and who doesn’t – why? Do some majors tend to have more of one gender than another – why? Does economic / cultural background influence choice of major – why? What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others? - PowerPoint PPT Presentation

TRANSCRIPT

Purpose: to Discover and Predict

Trends of Oregon Graduates from SOU:

1. Who graduates and who doesn’t – why?

2. Do some majors tend to have more of one gender than another – why?

3. Does economic / cultural background influence choice of major – why?

4. What do we see within age groupings – do certain age groupings gravitate toward certain majors and not others?

5. And finally, can we predict gender from the attributes of their major, age grouping, county, economic status, transfer status, and year graduated?

Data Modeling Tool Used: WEKA

• Classification / Prediction: WEKA decision tree (70% accuracy) predicted gender based on attributes of major, age grouping, year of graduation, county, economic status, transfer student.

• Clustering (108): Visually shows patterns of trends for combinations of attributes

Data

• A 12,982 records of SOU graduates from 1990 to present– 10,491 for training

– 2,491 for testing

• Attributes – PID

– Year graduated {1990 – 2006}

– Transfer student or not {Y,N}

– County {36}

– Economic county status Distressed {1,2,3}

– Age (Discretized into 7 categories)

– Major {1 - 297}

– Gender {F,M}

Attribute: major (297)@attribute MAJOR1_CODE{GSHU,COMS,HPOL,GSHF,GSIN,GSSM,MSBA,SEFR,SEMT,CSIS,MUCN,SEAT,SEMU,SES,SESP,EESP,SEBI,SEIS,SAAS,ABFA,ACA,AFE,ANTH,ANTP,ARLP,ARLT,ART,ARTH,ARTP,BA,BACH,BAHR,HR,BAMG,BAMK,BAMT,BAMU,BANM,BAOM,BAOP,BAPB,BAPH,BAPM,BARM,BASB,BCHP,BED,BIO,BIOH,BIOP,BMTP,BMUP,BOTC,BUSP,CBIS,CCJ,CHAC,CHBA,CHBI,CHEM,CHEP,CHPA,CIM,COJO,CJOP,CMHR,CMM,COMM,COMP,COBR,COTE,CRIM,CRIP,CRM,CS,CSG,CSIA,CSIN,CSMA,CSP,CSPS,ECD,ECEL,ECON,ECOP,ECTL,ED1P,ED2P,ED3P,ED4P,ED5P,ED6P,ED7P,ED8P,ED9P,EDEC,EDMS,EDP,EDUC,EE,EECI,EECT,EEHE,EEEC,EEHC,EEHL,EERE,EESB,EESL,ESP,EESU,EETS,EIAL,ELED,ELMS,EMAT,EMBE,EMBI,EMCH,EMDR,EMFR,EMGE,EMHE,EMIS,EMLA,BAAC,EMMT,EMMU,EMPE,EMPH,EMS,EMSP,EMSS,ENG,ENGL,ENGP,ENGR,ENGW,ES,ESB,ESC,ESG,ESGR,ESSP,FPA,GEGP,GEOG,GEOL,GEOP,GSBE,GSSS,HISP,HIST,HPAT,HPE,HPHP,HPP,HPPE,HPHS,HSP,HUM,INDP,INTD,INTP,INTS,LAFP,LAGP,LANC,LANF,LANG,LANS,LASP,MACS,MAP,MATH,MBA,MECI,MEEC,MERE,MESP,MHAT,MHBE,MHBI,MHCH,MHDR,MHFR,MHGE,MHHE,MHHP,MHIS,MHLA,MHMT,MHMU,MHPE,MHPH,MHS,MHSP,MHSS,MIM,MIMP,MMC,MMST,MSSP,MTAT,MTBE,MTBI,MTCH,MTDR,MTFR,MTGE,MTHE,MTHH,MTHP,MTIS,MTLA,MTMT,MTMU,MTPE,MTPH,MTRE,MTS,MTSE,MTSP,MTSS,MUIN,MUPF,MUS,MUSP,NAAM,NURP,NURS,PCHM,PCJO,PDEM,PDEN,PDHY,PEGR,PHR,PHYA,PHYP,PHYS,PLAW,PMED,PMET,POLP,POLS,POPT,POTH,PPAS,PPHA,PPTH,PRAM,PSY,PSY2,PSY3,PSY4,PSY5,PSY6,PSY7,PSYA,PSYC,PSYP,PVET,SCI,SCIP,SCTL,SEBE,SED,SEHC,SEHE,SEHL,SEHU,SELA,SEPE,SERE,SESB,SESL,SESM,SESS,SESU,SETS,SOC,SOLP,SPAN,SSCD,SSCI,SSCR,SSHS,SSPD,SSPS,TAFA,TBFA,TEAC,THAR,THEA,THEP,UNDL}

Decision Tree splits:majorcountytransferagegraduation yeargender

10,491 Training Set 2491 Training Set

Total = 12,982

Test data DT

Findings:

• Could within 70% accuracy predict F/M for majors (and by following the decision tree you can trace the branching to view the classification of attributes and how they relate)

• But, there were other interesting patterns found using

clustering (especially socio-economic)

To discover socio-economic correlations I added 1 attribute not in original data:

• Added Distressed_County Attribute (economic status)

– 1. Non Distressed– 2. Distressed– 3. Severly Distressed

• And Discretized Age Attribute into 6 Classifications– 1909 – 1939 (67- 97)– 1940 – 1949 (57- 66)– 1950 – 1959 (47- 56)– 1960 – 1969 (37- 46)– 1970 – 1979 (27- 36)– 1980 – 1986 (26 -20)

Oregon countieseconomic health

• http://www.gonorthwest.com/Oregon/Oregon-cities.htm

3 = Severly Distressed(are all rural)

2 = Distressed(except Marion, arenon metro)

1 = Not Distressed

I had based the Distressed Attribute on:

Map of Counties (socio-economic)

1. Red: distressed (rural)2. Yellow: non-metro (except Marion)3. Blue: not distressed

http://www.answers.com/topic/list-of-counties-in-oregon

Distressed counties

County Index Economic Status

Baker 0.64 Severely Distressed

Columbia 0.84 Distressed

Coos 0.77 Severely Distressed

Crook 0.85 Distressed

Curry 0.93 Distressed

Douglas 0.71 Severely Distressed

Gilliam 0.74 Severely Distressed

Grant 0.56 Severely Distressed

Harney 0.31 Severely Distressed

Hood River 0.87 Distressed

Jefferson 0.82 Distressed

Josephine 0.82 Distressed

Klamath 0.73 Severely Distressed

Lake 0.67 Severely Distressed

Lincoln 0.95 Distressed

Linn 0.76 Severely Distressed

Malheur 0.49 Severely Distressed

Marion 0.96 Distressed

Morrow 0.81 Distressed

Sherman 0.65 Severely Distressed

Umatilla 0.63 Severely Distressed

Union 0.79 Severely Distressed

Wallowa 0.77 Severely Distressed

Wasco 0.79 Severely Distressed

Wheeler 0.94 Distressed

County Economic Ranking

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

1 117 Clackamas County $25,973 $52,080

2 145 Washington County $24,969 $52,122

3 269 Multnomah County $22,606 $41,278

4 335 Benton County $21,868 $41,897

5 348 Deschutes County $21,767 $41,847

6 563 Columbia County $20,078 $45,797

7 634 Lane County $19,681 $36,942

8 670 Clatsop County $19,515 $36,301

9 674 Jackson County $19,498 $36,461

10 715 Polk County $19,282 $42,311

11 773 Tillamook County $19,052 $34,269

12 795 Yamhill County $18,951 $44,111

County Economic Ranking

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

13 864 Lincoln County $18,692 $32,769

14 958 Marion County $18,408 $40,314

15 1042 Curry County $18,138 $30,117

16 1148 Hood River County $17,877 $38,326

17 1240 Gilliam County $17,659 $33,611

18 1254 Linn County $17,633 $37,518

19 1284 Coos County $17,547 $31,542

20 1324 Sherman County $17,448 $35,142

21 1398 Wallowa County $17,276 $32,129

22 1416 Josephine County $17,234 $31,229

23 1433 Wasco County $17,195 $35,959

RankNational

RankCounty

Per CapitaIncome

Median House-hold Income

24 1591 Union County $16,907 $33,738

25 1594 Crook County $16,899 $35,186

26 1640 Grant County $16,974 $32,560

27 1674 Klamath County $16,719 $31,537

28 1730 Douglas County $16,581 $33,223

29 1804 Umatilla County $16,410 $36,249

30 1922 Harney County $16,159 $30,957

31 1929 Lake County $16,136 $29,506

32 2045 Wheeler County $15,884 $28,750

33 2086 Morrow County $15,802 $37,521

34 2142 Jefferson County $15,675 $35,853

35 2167 Baker County $15,612 $30,367

36 2762 Malheur County $13,895 $30,241

County Economic Ranking

Most interesting finding:

From 1990 to 2006We can see the amount of graduates are far greater from non distressed counties. However the ratio of graduates to non graduates (within each grouping) is extremely disproportionate when you compare groupings. When you compare the ratio of students who graduate (that come from non distressed counties), you see a predominate trend :

Students from distressed, and especially from severely distressed counties, who make it to SOU, Graduate.

Speculating the Reason: Financial Motivation

Education = Increased Income

Non transfer students were the predominent graduates

Jackson, Jefferson, Josephine and Klamath represented transfer studentsIt looks like graduates coming from a distance know they want to attend SOU

right out of high school.

Classified by major and transfer/non transfer:There was no indication of any particular major being the motivation, however our tuition is relatively lower (state) – a possible motivator.

Other Trends that were noted:Male (right)/ Female (left) ratio

is about the same per economic strata

The age groupings by gender are fairly equalGraduates tend to be older students

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Majors were the first split in the Decision Tree. General trends by clustering could be noted such as Males tended to be ‘sparse’ as English Graduates. Female Graduates were ‘sparse’ in all years within the CS programming track (82% M /17% F). Even in CSIS (79% M, 21% F) with the rest categorized as ‘general CS’ (92%, 8%) for a total of all tracks (81%, 19%)

108 clusters shows clearly the disparity of graduates from certain severely economically distressed counties

Age Groupings and Counties

Top to bottom age: 1980 – 1986 (26 -20)1970 – 1979 (27- 36)1960 – 1969 (37- 46)1950 – 1959 (47- 56)1940 – 1949 (57- 66)1909 – 1939 (67- 97)

Jitter pulled back to show our near neighbors (bottom):Douglas, Jackson, Josephine, Klamath

Age Groupings of Near Neighbor Graduates

Left to right age: 1909 – 1939 (67- 97)1940 – 1949 (57- 66)1950 – 1959 (47- 56)1960 – 1969 (37- 46)1970 – 1979 (27- 36)1980 – 1986 (26 -20)

Near Neighbor is the x axisTransfer yes/no is Y axis

(distressed) Josephine county produced one female CSIN major graduate (in the year 2000) – not definitive as I was clicking on instances (to see what I could find) and could

have missed another female from this county.

Gender is 50/50 for Near Neighbor Graduates

Listing by County, Number of Graduates#Max-min number of graduates: 4854 Jackson (near neighbor), 581 Multnomah (Distant)

(Wheeler 0, Gilliam 2 distant), Lake 39, Harney 25, Grant 9, Malheur 22, Crook 20

top related