cirpa 2016: individual level predictive analytics for improving student enrolment

Post on 15-Apr-2017

99 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Individual Level Predictive Analytics

Improving Student Enrolment Outcomes

Stephen Childs, Institutional Analyst @sechildsCIRPA/PNAIRP 2016, Kelowna, BCNovember 7, 2016

Office of Institutional Analysis

Why Predictive Analytics and IR

Higher Education Institutions collect more data IR offices have experts in institutional data IR offices are seeking ways to add more value Machine learning, predictive models are in the news

2

Opportunity… or Crisis?

Predictive Analytics are a different skill set A different set of software tools required You may be the only analyst working on this in your office Requesters expect you to be the expert Resistance to implementing insights from predictive analytics

The way forward

Add these skills to your IR toolkit Find tools that work with your existing ones Develop your understanding and expertise Community of Practice

Learning Outcomes

Have a high-level understanding of what predictive analytics does and how it works.

Have a concrete series of steps to follow. Know the vocabulary of machine learning and statistical

modeling. Know what tools can be used for this - and how they work

with existing tools Know about how we select, test, train models for prediction Learn some of the challenges in predictive modeling

Outline

Introduction (already done??) Introduction to Machine Learning Model Building Steps Tool Overview Customer Education Challenges Building Community

About Me

Machine Learning

Contrast with statistics Supervised and Unsupervised Learning Classification and Regression Different Algorithms

Predictive Data Analysis Steps

Goal Data Access

Analysis File Model Delivery

STEP 1: Define Your Goal

Sets the scope of your analysis Provides input into model selection Identifies stakeholders Discover what data is available Revise as the project progresses

STEP 2: Get Access to your Data

Three different types of data:— Operational SIS— Data Warehouse – snapshots— Predictive Analytics Data

Talk to your DBA to find out tables Think of other data to add:

— Residence, CRM— Socio-economic data

STEP 3: Build an Analysis File

Extract – Transform – Load— Use as much existing ETL as you can— Join tables together— Work with a programmer – but analyst drives

Hard to capture the timeline of the application— When did they apply?— When were they accepted?— When did they register?

STEP 3: Build and Analysis File - Tools

STEP 3: Build a Data Analysis File – Best Practices

Test your ETL process (automated is better) Save your data in a database (existing one, SQLite) Append rows to table and timestamp & use test indicator Keep track of program version Keep a changelog Capture more data, then filter that for analysis

STEP 4: Develop a model

Student Characteristics Outcomes

Independent Variables

Features

Dependent Variable

function

algorithm

formula

STEP 4: Develop a Model – Things to Watch Out For

Missing data Multiple models Model testing

STEP 4: Develop a Model - Accuracy

Refer back to your goal – no universal measure of accuracy Model used for decision making/resource allocation Assign loss based on incorrect predictions – minimize it Receiver Operating Characteristic (ROC) and Area Under the

Curve (AUC) Bias-Variance Trade Off and Overfitting

STEP 5: Deliver Your Results

Set up delivery early Meet with your audience – set expectations How will the data be used – refer back to goal Dashboards Data files

STEP 5: Delivery to Students

Have to carefully present information to students— Present a positive outlook— Don’t personalize it – talk about a group of similar

students. The factors in the model may be less deterministic than

unobserved factors. Difference between causality and correlation. Beware the self-fulfilling prophecy

Cathy O’Neil

@mathbabe, mathbabe.org Mathematician, former hedge-fund

quant

Weapons of Math Destruction

Three factors make a model a WMD:— Is the participant aware of the model? Is the model

opaque or invisible?— Does the model work against the participant’s interest? Is

it unfair? Does it create feedback loops?— Can the model scale?

Experience So Far

Longer than anticipated to get the data Working with the data was a great learning experience Automated process for harvesting data Starting to work on the delivery end

Challenges

Data quality Not enough RHS variables More categorical variables in usual ML problems

Community of Practice

Predictive Analytics Roundtable Mailing List – more discussion in future http://mailman.ucalgary.ca/mailman/listinfo/predictive-l Stephen.Childs@ucalgary.ca @sechilds #CIRPA2016 PyData, other user groups

top related