chapter 1 initial description of data mining in business prepared by: dr. tsung-nan tsai
TRANSCRIPT
![Page 1: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/1.jpg)
Chapter 1Chapter 1Initial Description of Data Mining Initial Description of Data Mining
in Businessin Business
Prepared by: Dr. Tsung-Nan Tsai
![Page 2: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/2.jpg)
結束
1-2
ContentsContents
Introduces data mining concepts
Presents typical business data applications
Explains the meaning of key concepts
Gives a brief overview of data mining tools
Outlines the remaining chapters of the book
![Page 3: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/3.jpg)
結束
1-3
DefinitionDefinition
DATA MINING: exploration & analysisRefers to the analysis of the large quantities of data that
are stored in computers.by automatic meansof large quantities of datato discover actionable patterns & rules
Data mining is a way to use massive quantities of data that businesses generate
GOAL - improve marketing, sales, customer support through better understanding of customers
![Page 4: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/4.jpg)
結束
1-4
Retail OutletsRetail Outlets
Bar coding & scanning generate masses of datacustomer service (Grocery stores can quickly
process he purchases and accurately determine product prices)
inventory control (Determine the quantity of items of each product on hand, supply chain management)
MICROMARKETINGCUSTOMER PROFITABILITY ANALYSISMARKET-BASKET ANALYSIS
![Page 5: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/5.jpg)
結束
1-5
Political Data MiningPolitical Data Mining
Grossman et al., 10/18/2004, Time, 38
2004 ElectionRepublicans: VoterVault
From Mid-1990sAbout 165 million votersMassive get-out-the-vote drive
for those expected to vote Republican
Democrats: DemzillaAlso about 165 million votersNames typically have 200 to
400 information items
![Page 6: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/6.jpg)
結束
1-6
Medical DiagnosisMedical Diagnosis
J. Morris, Health Management Technology Nov 2004, 20, 22-24
Electronic Medical RecordsAssociated Cardiovascular
Consultants31 physicians40,000 patients per year,
southern New JerseyData mined to identify
efficient medical practiceEnhance patient outcomesReduced medical liability
insurance
![Page 7: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/7.jpg)
結束
1-7
Mayo ClinicMayo Clinic
Swartz, Information Management Journal Nov/Dec 2004, 8
IBM developed EMR programComplete records on almost
4.4 million patients.Doctors can ask for how last
100 Mayo patients with same gender, age, medical history responded to particular treatments.
![Page 8: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/8.jpg)
結束
1-8
Business Uses of Data MiningBusiness Uses of Data Mining
Toyata used the data mining of its data warehouse to determine more efficient transportation routes, reducing time-to-market by average of 19 days.
Bank firms used the data mining in soliciting credit card customers,
Insurance and Telecommunication companies used DM to detect fraud.
Manufacturing firms used DM in quality control,
Many …..
![Page 9: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/9.jpg)
結束
1-9
Business Uses of Data MiningBusiness Uses of Data Mining
1. Customer profiling Identify profitability from subset customers
2. Targeting• Determine characteristics of most profitable
customers
3. Market-Basket Analysis• Determine correlation of purchases by profile
(customers)
• Cross-selling
• Part of Customer Relationship Management
![Page 10: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/10.jpg)
結束
1-10
What is needed to do DM?What is needed to do DM?
DM requires the identification of a problem, along with data collection that can lead to a better understanding of the market.
Computer models provide statistical or other means of analysis.
Two general types of DM studies:1. Hypothesis testing: involving expressing a theory
about the relationship between actions and outcomes.
2. Knowledge discovery: a preconceived notion may not be present, but rather than relationships can be identified by looking at the data (correlation analysis).
![Page 11: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/11.jpg)
結束
1-11
Reasons why Data Mining is now effectiveReasons why Data Mining is now effective
Data are there
Data are warehoused (computerized)Walmart: 35 thousand queries per week
Computing economically available
Competitive pressure
Commercial products available
![Page 12: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/12.jpg)
結束
1-12
TrendsTrends
Every business is servicehotel chains record your
preferencescar rental companies the sameservice versus price
credit card companieslong distance providersairlinescomputer retailers
![Page 13: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/13.jpg)
結束
1-13
TrendsTrends
Information as ProductCustom Clothing Technology Corporation
fit jeans, other clothing
INFORMATION BROKERINGIMS - collects prescription data from pharmacies, sells
to drug firmsAC Nielsen - TV
![Page 14: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/14.jpg)
結束
1-14
TrendsTrends
Commercial Software Availableusing statistical, artificial intelligence tools
that have been developedEnterprise Miner SASIntelligent Miner IBMClementine SPSSPolyAnalyst MegaputerSpecialty products
![Page 15: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/15.jpg)
結束
1-15
Fingerhut’s DM modelsFingerhut’s DM models
Fingerhut used segmentation, decision tree, regression analysis, and neural modeling tools from SAS for regression analysis tools and SPSS for neural network tools.
The segmentation model combines order and basic demographic data with Fingerhut’s product offerings.
Neural network models used to identify in mailing patterns and order filling telephone call orders.
Goal: Create new mailings targeted at customers with the greatest
potential payoff. Create a catalog containing products that those who is interested
in, such as furniture, telephones…
![Page 16: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/16.jpg)
結束
1-16
How Data Mining Is Being UsedHow Data Mining Is Being Used
U.S. Government track down Oklahoma City
bombers, Unabomber, many others
Treasury department - international funds transfers, money laundering
Internal Revenue Service
![Page 17: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/17.jpg)
結束
1-17
How Data Mining Is UsedHow Data Mining Is Used
Fireflyasks members to rate
music and moviessubscribers clusteredclusters get custom-
designed recommendations
![Page 18: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/18.jpg)
結束
1-18
Warranty Claims RoutingWarranty Claims Routing
Diesel engine manufacturerstream of warranty claimsexamine each by expert
determine whether charges are reasonable & appropriate
think of expert system to automate claims processing
![Page 19: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/19.jpg)
結束
1-19
Data mining application areaData mining application area
Application Area Applications Specifics
Retailing Affinity positioning
Cross-selling
Position products effectively
Find more products for customers
Banking Customer relationship management
Identify customer value
develop programs to maximize revenue
Credit card Management
Lift
Churn,
Fraud detection
Identify effective market segments
Identify likely customer turnover
Insurance Fraud detection Identify claims meriting investigation
Telecommunications Churn Identify likely customer turnover
Telemarketing Online information Aid telemarketers with easy data access
Human Resource Management
Churn Identify potential employee turnover
![Page 20: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/20.jpg)
結束
1-20
RetailingRetailing
Affinity positioning is based up the identification of products that the same customer is likely to want.Cold medicine tissues
Cross-selling: The knowledge of products that go together can be used by marketing the complementary product.Grocery stores do that through position product shelf
location.
Grocery stores generate mountains of cash register data. Current technology enables grocers to look at customers who have defected from a store, their purchase history, and characteristics of other potential defectors.
![Page 21: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/21.jpg)
結束
1-21
Cross-sellingCross-selling
USAA insurancedoubled number of products held by average
customer due to data miningdetailed records on customerspredict products they might need
Fidelity Investmentsregression - what makes customer loyal
![Page 22: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/22.jpg)
結束
1-22
BankingBanking
CRM involves the application of technology to monitor customer service, a function that is enhanced through data mining support.
DM applications in finance include predicting the prices of equities involving a dynamic environment with surprise information, some of which might be inaccurate …
Only 3% of the customers at Norwest bank provided 44% of their profits.
CRM products enable banks to define and identify customer and household relationships.
![Page 23: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/23.jpg)
結束
1-23
Retaining Good CustomersRetaining Good Customers
Customer loss:Banks - AttritionCellular Phone Companies - Churn
study who might leave, whySouthern California Gas
– customer usage, credit information
– direct mail contact - most likely best billing plan
– who is price sensitive
Who should get incentives, whom to keep
![Page 24: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/24.jpg)
結束
1-24
Credit card managementCredit card management
Bank credit card marketing promotions typically generate 1,000 responses to mailed solicitations – a response rate of about 1%. The rate is improved significantly through data mining analysis.
DM tools used by banks include credit scoring which is a quantified analysis of credit applicants with respect to predictions of on-time loan repayment. (Data covering deposits, savings, loans, credit card, insurance…).
These credit scores can be used to accept/reject recommendations, as well as to establish the size of a credit line.
ATM machines could be rigged up with electronic sales pitches for products that a particular customer is likely to be interested in.
![Page 25: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/25.jpg)
結束
1-25
Fairbank & MorrisFairbank & Morris
Credit card company’s most valuable asset:INFORMATION ABOUT CUSTOMERS
Signet Banking Corporationobtained behavioral data from many sourcesbuilt predictive modelsaggressively marketed balance transfer card
First Unionwho will move soon - improve retention
![Page 26: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/26.jpg)
結束
1-26
TelecommunicationsTelecommunications
Retention of customers for telemarketing is very difficult. The phenomenon of a customer switching carriers is referred to as churn, a fundamental concept in telemarketing as well as in other fields.A communications company considered the 1/3 of churn is due to poor call quality, and up to ½ is due to poor equipment.A cellular fraud prevention monitors traffic to spot problems with faulty telephones. When a telephone begins to go bad, telemarketing personal are alerted to contact the customer and suggest bringing the equipment in for service.Another way to reduce churn is to protect customers from subscription and cloning (duplication) fraud. Fraud prevention systems provide verification that is transparent to legitimate subscribers.
![Page 27: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/27.jpg)
結束
1-27
Human resource managementHuman resource management
Business intelligence is a way to truly understand markets, competitors, and processes.Software technology such as data warehouses, data marts, online analytical processing (OLAP), and data mining can be used to improve firm’s profitability.In HRM, the analysis can lead to the identification of individuals who are liable to leave the company unless additional compensation or benefits are provided.HRM would identify the right people so that organizations could treat them well and retain them (reduce churn).
![Page 28: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/28.jpg)
結束
1-28
Methodology and ToolsMethodology and Tools
Analyzing dataGiven management goals and that management
can translate knowledge into action
![Page 29: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/29.jpg)
結束
1-29
Basic StylesBasic Styles
Top-Down: HYPOTHESIS TESTINGSUPERVISEDhave a theory, experiment to prove or disproveSCIENCE
Bottom-Up: KNOWLEDGE DISCOVERYUNSUPERVISEDstart with data, see new patternsCREATIVITY
![Page 30: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/30.jpg)
結束
1-30
Hypothesis TestingHypothesis Testing
Generate theory
Determine data needed
Get data
Prepare data
Build computer model
Evaluate model resultsconfirm or reject hypotheses
![Page 31: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/31.jpg)
結束
1-31
Generate TheoryGenerate Theory
Systematically tie different input sources together (MENTAL MODEL)What causes sales volume?
sales rep performanceeconomy, seasonalityproduct quality, price, promotion,
location
![Page 32: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/32.jpg)
結束
1-32
Generate TheoryGenerate Theory
Brainstorm:diverse representatives for broad coverage of
perspectives (electronic)keep under control (keep positive)generate testable hypotheses
![Page 33: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/33.jpg)
結束
1-33
Define Data NeededDefine Data Needed
Determine data needed to test hypothesisLucky - query existing databaseMore often - gather
pull together from diverse databases, survey, buy
![Page 34: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/34.jpg)
結束
1-34
Locate DataLocate Data
Usually scattered or unavailable
Sources: warranty claims
point-of-sale data (cash register records) medical insurance claims telephone call detail records direct mail response records demographic data, economic data
PROFILE: counts, summary statistics, cross-tabs, cleanup
![Page 35: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/35.jpg)
結束
1-35
Prepare Data for AnalysisPrepare Data for Analysis
Summarize: too much - no discriminant information too little - swamped with useless
detailProcess for computer: ASCII, SpreedsheetData encoding: how data are recorded can vary - may have been collected with specific purposeTextual data: avoid if possible (may need to code)Missing values: missing salary - use mean?
![Page 36: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/36.jpg)
結束
1-36
Build and Evaluate ModelBuild and Evaluate Model
Build Computer ModelChoice the appropriate modeling tools and algorithmsTraining and test data sets.
Determine if hypotheses supportedstatistical practicetest rule-based systems for accuracy
Requires both business and analytic knowledge
![Page 37: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/37.jpg)
結束
1-37
SUPERVISEDSUPERVISED
Dorn, National Underwriter Oct 18, 2004, 34,39
Health care fraudUse statistics to identify
indicators of fraud or abuseCan rapidly sort through large
databasesIdentify patterns different from
normModerately successful
But only effective on schemes already detected
To benefit firm, need to identify fraud before paying claim
![Page 38: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/38.jpg)
結束
1-38
Knowledge DiscoveryKnowledge Discovery
Machine learning?Usually need intelligent analyst
Directed: explain value of some variable
Undirected: no dependent variable selectedidentify patterns
Use undirected to recognize relationships; use directed to explain once found
![Page 39: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/39.jpg)
結束
1-39
DirectedDirected
Goal-orientedExamples: If discount applies, impact on products -
who is likely to purchase credit insurance?Predicted profitability of new customer - what to bundle with a particular packageIdentify sources of preclassified dataPrepare data for analysisBuilt & train computer modelEvaluate
![Page 40: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/40.jpg)
結束
1-40
Identify Data SourcesIdentify Data Sources
Best - existing corporate data warehousedata clean, verified, consistent, aggregated
Usually need to generatemost data in form most efficient for designed
purposehistorical sales data often purged for dormant
customers (but you need that information)
![Page 41: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/41.jpg)
結束
1-41
Prepare DataPrepare Data
Put in needed format for computer
Make consistent in meaning
Need to recognize what data are missingchange in balance = new – old
add missing but known-to-be-important data
Divide data into training, test, evaluation
Decide how to treat outliersstatistically biasing, but may be most important
![Page 42: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/42.jpg)
結束
1-42
Build & Train ModelBuild & Train Model
Regression - human builds (selects IVs)
Automatic systems traingive it data, let it hammer
OVERFITTING:fit the dataTEST SET a means to evaluate model against
data not used in trainingtune weights before using to evaluate
![Page 43: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/43.jpg)
結束
1-43
Evaluate ModelEvaluate Model
ERROR RATE: proportion of classifications in evaluation set that were wrong
too little training: poor fit on training data and poor error rate
optimal training: good fit on both
too much training: great fit on training data and poor error rate
![Page 44: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/44.jpg)
結束
1-44
Undirected DiscoveryUndirected Discovery
What items sell together? Strawberries & creamDirected: What items sell with tofu? tabasco
Long distance caller market segmentationUniform usage - weekday & weekend, spikes
on holidaysAfter segmentation:
high & uniform except for several months of nothing
![Page 45: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/45.jpg)
結束
1-45
UNSUPERVISEDUNSUPERVISED
Dorn, National Underwriter Oct 18, 2004, 34,39
Health care fraudLook at historical claim
submissionsBuild ad hoc model to
compare with current claims
Assign similarity score to fraudulent claims
Predict fraud potential
![Page 46: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/46.jpg)
結束
1-46
Undirected ProcessUndirected Process
Identify data sources
Prepare data
Build & train computer model
Evaluate model
Apply model to new data
Identify potential targets for undirected
Generate new hypotheses to test
![Page 47: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/47.jpg)
結束
1-47
Generate hypothesesGenerate hypotheses
Any commonalities in data?
Are they useful?Many adults watch children’s movies
chaperones are an important market segmentthey probably make final decision
When hypothesis is generated, that determines data needed
![Page 48: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/48.jpg)
結束
1-48
Bank Case StudyBank Case Study
Directed knowledge discovery to recognize likely prospects for home equity loan
training set - current loan holdersdeveloped model for propensity to borrow got continuous scores, ranked customerssent top 11% material
Undirected: segmented market into clustersin one, 39% had both business & personal
accountscluster had 27% of the top 11%
Hypothesis: people use home equity to start business
![Page 49: Chapter 1 Initial Description of Data Mining in Business Prepared by: Dr. Tsung-Nan Tsai](https://reader035.vdocument.in/reader035/viewer/2022062716/56649e155503460f94afffac/html5/thumbnails/49.jpg)
結束
1-49
Data mining products and data setsData mining products and data sets
A good source to view current DM products is www.KDNuggests.com.
The UCI Machine Learning Repository is a source of very good data mining datasets at www.ics.uci.edu/~mlearn/MLOther.html.
Weka DM software at http://www.cs.waikato.ac.nz/ml/weka/
Tanagra DM software at http://eric.univ-lyon2.fr/~ricco/tanagra/index.html