data mining for business analydaytics - week 1 (1)

23
Introduction to Data Mining Data Mining for Business Analytics Introduction to Data Science Kent State University Spring 2015 – Class 1 These slides incorporate the result of input/collaborations with Maytal Saar-Tsechansky, Claudia Perlich, and Foster Provost.

Upload: sailorolive

Post on 07-Dec-2015

214 views

Category:

Documents


0 download

DESCRIPTION

data mining basics

TRANSCRIPT

Introduction to Data Mining

Data Mining for Business Analytics

Introduction to Data Science

Kent State University

Spring 2015 – Class 1

These slides incorporate the result of input/collaborations with Maytal Saar-Tsechansky,

Claudia Perlich, and Foster Provost.

An example business problem

• TelCo, a major telecommunications firm, wants to investigate its problem with customer attrition, or “churn”

• Let’s consider this for now as a marketing problem only

How would you go about targeting some customers with

a special offer, prior to contract expiration? Think about

what data should be available for you to use.

Moneyball

•The story of Oakland A's general manager Billy Beane's successful attempt to put together a baseball club on a budget by employing computer-based data analysis to draft his players.

Roles in Data Science

• Data Scientist• Understand the potential

• Can translate from business to execution

• Ability to evaluate proposal and execution

• Can do the actual modeling

• Applied statistician x computer scientist

Roles in Data Science

• Data Scientist• Understand the potential• Can translate from business to execution• Ability to evaluate proposal and execution• Can do the actual modeling• Applied statistician x computer scientist

• Collaborator in a data-centric project• Can translate from business to the execution

• Managing a data-mining project• Understanding the potential• Ability to evaluate a proposal and execution• Ability to interface with broad variety of people

• Strategist, Investor, …• Envisions opportunities, come up with novel ideas, evaluate the promise of new ideas, design data

science project / companies conceptually

Learning Goals

• Approach business problems data-analytically• Think carefully & systematically about whether & how data can improve

performance

• Be able to interact competently on the topic of data mining for business analytics

• Know the basics of the data mining processes, techniques, and concepts well enough

• Receive hands-on experience mining data• You should be able to follow up on ideas or opportunities that present

themselves

From data & business to strategy

HURRICANE FRANCES was on its way, barreling across the Caribbean, threatening a direct hit on Florida's Atlantic coast. Residents made for higher ground, but far away, in Bentonville, Ark., executives at Wal-Mart Stores decided that the situation offered a great opportunity for one of their newest data-driven weapons, something that the company calls predictive technology.

A week ahead of the storm's landfall, Linda M. Dillman, Wal-Mart's chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. Backed by the trillions of bytes' worth of shopper history that is stored in Wal-Mart's data warehouse, she felt that the company could "start redicting what's going to happen, instead of waiting for it to happen," as she put it.

From NY Times

Big Data Hype?

Gartner Hype Cycle

What do we really

care about?

Why data mining? Why now?

• Confluence of 4 technical advances• Storage

• Disk densities have been doubling each year

• A $100 disk today has over 1,000,000x more capacity/$ than the disks of 30 years ago

• Networking• Data can be transferred easily between collection, storage, and use eBusiness systems do it as a

matter of course (w/ fewer data errors)

• Algorithms• Advanced algorithms from machine learning, pattern recognition, and applied statistics have

become mature enough for mainstream use

• Computing power• Processing power has been doubling every 1.5 years or so (Moore’s law)

• Laptops are more powerful than the supercomputers of yesteryear

• Note: all four of these are essential for effective, successful data mining

What really is Data Mining

A process for using information technology to extract useful (non-trivial, hopefully actionable) knowledge from large bodies of data

• A set of principles, concepts, and techniques that structure thinking and analysis of data

• Extracts useful information and knowledge from large volumes of data by following a process with reasonably well defined steps

• Changes the way you think about data and its role in business

Data Opportunities

• Volume of data

• Variety of data

• Powerful computers

• Better algorithms

Data Mining Process Outline

• Business Understanding

• Data Understanding

• Data Preparation

• Modeling

• Evaluation

• Deployment

Data Mining Process

Business data mining is a process

Science Craft CreativityCommon

SenseProcess

Mini case: What data might TelCo mine to help

with churn management?

Types of Data Mining TasksMany business problems have as an important component one of these data mining tasks:

• Affinity grouping (a.k.a. “associations”, “market-basket analysis”)• What items are commonly purchased together?

• Similarity Matching• What other companies are like our best small business customers?

• Description/Profiling• What does “normal behavior” look like?

• Clustering• Do my customers form natural groups?

• Predictive Modeling (including causal modeling & link prediction)• Will customer X churn next month/default on her loan?

• How much would prospect X spend?

• Who might be good “friends” on our social networking site?

Un

sup

erv

ise

dS

up

erv

ise

d

This is NOT a course about…

• Statistics

• Database Querying• SQL

• Data Warehousing

• Regression Analysis• Explanatory vs. Predictive Modeling

Data Mining versus…

• Data Warehousing / Storage• Data warehouses coalesce data from across an enterprise, often from

multiple transaction-processing systems

• Querying / Reporting (SQL, Excel, QBE, other GUI-based querying)• Very flexible interface to ask factual questions about data

• No modeling or sophisticated pattern finding

• Most of the cool visualizations

• OLAP – On-line Analytical Processing• OLAP provides easy-to-use GUI to explore large data collections

• Exploration is manual; no modeling

• Dimensions of analysis preprogrammed into OLAP system

Data Mining versus…

• Traditional statistical analysis• Mainly based on hypothesis testing or estimation / quantification of

uncertainty

• Should be used to follow-up on data mining’s hypothesis generation

• Automated statistical modeling (e.g., advanced regression)• This is data mining, one type – usually based on linear models

• Massive databases allow non-linear alternatives

Answering business questions with these

techniques…

• Who are the most profitable customers?• Database querying

• Is there really a difference between profitable customers and the average customer?

• Statistical hypothesis testing

• But who really are these customers? Can I characterize them?• OLAP (manual search), Data mining (automated pattern finding)

• Will some particular new customer be profitable? How much revenue should I expect this customer to generate?

• Data mining (predictive modeling)