michigan dgs 2015 presentation - leveraging big data with meaningful analytics - kirk ocke

12
Unlocking the potential of your data

Upload: erepublic

Post on 08-Dec-2015

5 views

Category:

Documents


1 download

DESCRIPTION

Michigan DGS 2015 Presentation - Leveraging Big Data With Meaningful Analytics by Kirk Ocke

TRANSCRIPT

Unlocking the potential of your data

Background

PARC | 2

• What is Big Data?– Characterized by: Volume, Velocity, Variety, …– Cutting edge techniques and technologies often required.– Buzz word or real phenomenon? Both!– Big Data requirements become apparent once you understand the problem you’re

solving.

• Just Plain Old Data:– Most uses of data do not currently require big data solutions.– Instead focus on what you want to use your data for? What are your business

objectives? What value are you trying to create?– There is a lot of value locked away inside your plain old data.

The Dream: Data that leads to Actionable Insights

PARC | 3

XIX F4A F4A 1 0 CPR 20 FEMIVA C4A C4A 4 0 CPR 17 FEMN4A F4A F4A 2 0 CPR 31 FEMXIX F4A F4A 3 0 CPR 20 FEMXIX F4A F4A 3 0 CPR 34 FEMN4A F4A F4A 2 0 CPR 29 FEMN4A ONA ONA 2 1 CPR 20 FEMIVA ONA ONA 1 0 CPR 19 FEMN4A F4A F4A 1 0 WNG 18 FEMIVA C4A C4A 5 0 CPR 18 FEMIVA C4A C4A 1 0 NCP 20 FEMN4A ONA ONA 2 0 CPR 21 FEMN4A ONA ONA 1 0 CPR 30 FEMN4A F4A F4A 1 0 CPR 16 FEMAAO F4A F4A 3 0 CPR 31 FEMN4A F4A F4A 1 0 CPR 16 FEMAAO F4A F4A 3 0 CPR 17 FEMN4A ONA ONA 1 0 CPR 36 FEMXIX F4A F4A 2 0 CPR 32 FEMXIX MNA MNA 2 0 CPR 15 FEMAAO F4A C4A 2 0 CPR 19 FEMN4A F4A F4A 2 0 CPR 18 FEMN4A F4A F4A 1 1 CPR 22 FEMAAO F4A F4A 1 1 CPR 16 FEMIVA C4A C4A 1 0 CPR 23 FEMXIX F4A F4A 1 1 CPR 20 FEMN4A F4A F4A 1 0 CPR 14 FEMXIX ONA ONA 3 0 CPR 20 FEMN4A F4A F4A 1 0 CPR 21 FEMIVA C4A C4A 1 0 WNG 21 FEM

Data AnalysisInsights that allow us to take the actions required to accomplish something extraordinary!

Small market teams effectively compete with large market teams.

BaseballPlayerData

Bill James

The Reality: Disparate Data, Vague Goals and Organizational Inertia

PARC | 4

LOGS

SensorsMainframes

VariousDatabases Paper

Restricted Access

Disparate DataWhat do we want to do again?

Get results of course!

Vague Goals

The Results speak for themselves

Everyone will love them.

Organizational Inertia

The Solution:Understand your data, goals and organization

PARC | 5

Understand your data:• Explore the data in order to gain a rich understanding of what’s there.• Transform the data• Never under estimate the effort required here.

Perform the right analysis:• What problem do you want to solve? • Focus on the value to be created.• The right team (yes you need a team) • Use the simplest model to get the job done.

Change the organization is necessary:• Using insights from data often requires people change the way they work:• Consider this when you set your goals.• Often change agents are needed at many levels of the organization.• Organizational structure and culture are important, and they are often completely

overlooked – or are an after thought – when data analysis projects are conceived.

PARC | 6

Ohio DJFS:Who pays their child support

IMS Database

CustomCode

SQL

Raw Data

Census Data

Understand the data:

Augment with public dataMissing Values?Etc…

1.33 TB

SQL

TransformedData

2 GB

6+ Months

Goals: • Describe who is and isn’t paying their child

support. • Develop predictive scoring models to rank

cases at case initiation.• Build models that are interpretable by case

workers.

Team:• Data Scientist and Researchers• Software Engineers, • Subject Matter Experts - including former

IV-D directors

Organization:• State Run - County Administered• Case workers are knowledge workers• Case workers touch real lives.

PARC | 7

Ohio DJFS:Who pays their child support

Percent of Cases

30% pay ≥ 80% ofestablished obligation

52%pay < 80% ofestablished obligation

10%Zeroorders

8%No

Order

Pay 80% or more

Pay Less than 80%

Average Age Oldest Child at Case Initiation 4.7 3.9

Average Age Youngest Child at Case Initiation 3.8 3.3

Average Number Children CP Born Out of Wedlock 1.0 1.6

Average Number Children AP Born Out of Wedlock 1.0 2.0

Percent where CP and AP Formerly Married 46% 19%

Pay 80% or more

Pay Less than 80%

Average Percent of Total Obligation Paid over life of case 95% 38%

PARC | 8

Ohio DJFS:Scoring Model for Cuyahoga County

Largest county in Ohio by population in 2010Surrounds Cleveland

62002 CasesInitiated btw

FY 05 – FY 13

57 featuresData known at case initiation

27 categorical30 numerical

Link to Dashboard

53

53 is the right hand side of this equation: if k � 𝑋𝑋𝑋𝛽𝛽𝑋 ≥ k 𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡

1−𝑡𝑡− 𝛽𝛽0

then �𝑌𝑌 = 1

Taking the suggestion from the last presentation to consider county level analysis we built our models only at the county level.

Ohio DJFS:Cuyahoga County Model Performance

PARC | 9

Prediction Model

Pool of All Cases (N)

15% green 85% red

High-Value Cases (Top 20% of N)

63% Recall

47% Precision(205% increase in predictive

power over random guessing)

= case with an AP that paid ≥ 80% of child support obligation over life of case

= case with an AP that paid < 80% of child support obligation over life of case

Key MessageThe top fifth of the caseload, according to the model,

contains almost 2/3 of the good paying cases.

“You could theoretically touch almost 2/3 of the good paying cases by only working 20% of the

caseload”

PARC | 10

County of Los Angeles, CA:Who’s doing all the printing?The largest county in the U.S. – County of Los Angeles, CA.

Employees in 33 departments were spending millions of dollars printing. No oversight. No data. No acquisition strategy.

Through assessments, the CIO’s office discovered 43,000 printers and copiers.

PARC | 11

County of Los Angeles, CAManaging Printing Across 33 Departments

Department inputs

“The fact that we have numbers has been very effective. It’s rare there are quantitative results for these kinds of projects.”

Rich Sanchez, CIO County of LA, CA

• $9M annual savings• $50M+ total savings • 56% printer decrease -

44,000 to 18,500 • Less administrative

effort & IT support • Cut electrical

consumption 58% = 700 homes annually

PARC | 12

Thank you