cisb594 – business intelligence data mining part i
TRANSCRIPT
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data MiningData MiningPart IPart I
CISB594 – Business IntelligenceCISB594 – Business Intelligence
ReferenceReference• Materials used in this presentation are extracted mainly from
the following texts, unless stated otherwise.
CISB594 – Business IntelligenceCISB594 – Business Intelligence
ObjectivesObjectivesAt the end of this lecture, you should be able to:• Describe data mining, its characteristics and objectives
in business• Identify and explain the common algorithms used in
data mining• Discuss the use of data mining in different types of
business• Discuss the importance of data mining in
understanding customers’ behaviours
CISB594 – Business IntelligenceCISB594 – Business Intelligence
CISB594 – Business IntelligenceCISB594 – Business Intelligence
What is Data MiningWhat is Data Mining
• A process that uses statistical, mathematical, artificial intelligence and machine learning techniques to extract and identify useful information and subsequent knowledge from large database
Uses sophisticated data manipulation
technology
Identifies useful information
Deals with large databases
Data MiningData Mining
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining Concepts Data Mining Concepts and Applications and Applications
• Where is Data Mining in Business Intelligence?
CISB594 – Business IntelligenceCISB594 – Business Intelligence
• Users today will want to perform statistical and mathematical analysis such as hypothesis testing, prediction and customer scoring models
• A major step in managerial decision making is forecasting or estimating the results of different alternative courses of actions
• Such investigation cannot be done with basic OLAP and will require special tools – advanced business analytics – data mining
Why do we need Data MiningWhy do we need Data Mining
CISB594 – Business IntelligenceCISB594 – Business Intelligence
• Data are often buried deep within very large databases, which sometimes contain data from several years
• Sophisticated tools are used to clean and synchronize data in order to get the best result
• Miners are the end users who are empowered with sophisticated tools to ask ad-hoc questions – they need not be technically equipped
• Miners may find an unexpected result during data mining activities and this will require creative thinking on the users’ decision making
Major Characteristics of Data Major Characteristics of Data MiningMining
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining algorithmsData Mining algorithmsFall into four broad categories:1. Classification
– Also known as supervised induction, most common of all data mining activities
– Used to analyse the historical data stored in the database and to automatically generate a model that can predict future behaviour
– Identify patterns of data to belong to a certain category
– Application example : target marketing (likely customer or no hope, based on the previous customers’ behaviour)
Medical Insurance company:Clients with a history of diabetes (from maternal/paternal side) are likely to also have diabetes in a later stage of his/her life. A special premium coverage can be designed for the potential health condition
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining algorithmsData Mining algorithmsFall into four broad categories:2. Clustering
– Partitioning a database into segments in which the members of a segment share similar qualities
– Unlike classification, the cluster is unknown when the algorithm starts.
– Clustering technique includes optimization, the goal is to create groups so that members within each group have maximum similarity and the members across groups have minimum similarity
– Before the results of clustering techniques are used, it might be necessary for an expert to interpret, modify the information
– Application example : Market segmentation
Comb the whole data to identify sharing of similar qualities/characteristics and create group based on that.E.g. Payment by credit card is more popular in the urban area compared to the rural area. Demographically, the social class determines the method of payment. This can be interpreted into business decisions/strategy.
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Classifying vs. ClusteringClassifying vs. Clustering
What is the major difference between cluster analysis and classification?
• Classification is sorting cases into groups so that members of the same group are strongly associated in some meaningful way.
• Cluster analysis is one way to identify the groups that classification requires.
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining algorithmsData Mining algorithmsFall into four broad categories:3. Association
– Establishes relationship about items that occur together in a given record
– Determining associations among items that sell together
– Often called market basket analysis as the primary applications is the analysis of sales transactions
– Application example : Market basket analysis
Placing microweavable pop-corn in the soft drinks isle
Placing batteries in the toys isle
Placing women’s magazines in the baby formula isle
Placing lemons and marinating herbs at the butcher section of the supermarket
Sales of hobs and hoods and oven as part of kitchen cabinets
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining algorithmsData Mining algorithmsFall into four broad categories:4. Sequence discovery
– The identification of association over time
– Some sequence discovery techniques keep track of elapsed time between associated events and the frequency of occurrences
– Application example : Market basket analysis over time, customer life cycle analysis
Unemployed consumer who purchased pre paid telco service are most likely to convert to postpaid upon being employed
Purchase of machinery will later be followed by the purchase of maintenance service
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Types of data miningTypes of data mining
2 typesHypothesis-driven data miningBegins with a proposition by the user, who then seeks tovalidate the truthfulness of the propositione.g. Start with a statement - The cause of fire during road accident
is due to the modification of vehicle by an unauthorized parties, then use data mining to prove the statement
Discovery-driven data miningFinds patterns, associations, and relationships among the data
in order to uncover facts that were previously unknown or not even contemplated by an organization
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Use in businessUse in business
• Where data mining is beneficial (the intent in most of these examples is to identify a business opportunity and create a sustainable competitive advantage). Fill in the blanks.
Business Use
Marketing
Banking Forecasting levels of bad loans, fraud in credit card usage, credit card spending pattern, new loans
Retailing and sales
Predicting sales, determining correct inventory levels and distribution schedules
Manufacturing and production
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Use in businessUse in business
• Where data mining is beneficial (the intent in most of these examples is to identify a business opportunity and create a sustainable competitive advantage)
Business Use
Government and defense
Forecasting threats to national security, predicting resources consumptions
Health
Airlines
Broadcasting
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Understanding customer Understanding customer behaviourbehaviour
For most retail environments, three sources of customers data are most critical to data mining efforts aimed at better understanding of behavior:
– Demographic data – salary, population– Transaction data – purchase type, online, cash, credit – Online interaction data - favourite sections in website
(clickstream analytics can be used to identify who did/did not buy product, why and when)
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Data Mining in retailData Mining in retail• The process of data mining in retail has three different aspects:
1. Web analytics – Gather web statistics that track customer’s online behaviour ; hit, pages, sales, volume, and so on. This helps in adjusting a web site to meet customer needs.
2. Customer analytics – web sites interaction, transaction data from offline purchases, and demographic data. This is critical in CRM and revenue management because a better understanding allows an organization to cluster customers into groupings.
3. Optimization – Patterns can be detected and used to optimize customer interactions. For example in recommending relevant styles and complementary purchases/products to suit customer behaviour
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Your assignmentYour assignment
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Now ask if …Now ask if …You are now be able to:• Describe data mining, its characteristics and objectives
in business• Identify and explain the common algorithms used in
data mining• Discuss the use of data mining in different types of
business• Discuss the importance of data mining in
understanding customers’ behaviours
CISB594 – Business IntelligenceCISB594 – Business Intelligence
CISB594 – Business IntelligenceCISB594 – Business Intelligence
Assignment UpdatesAssignment Updates
• Assignment 2 to be submitted on the 7th March 2011• Please print the marking scheme and include in the
submission• Teams are to book for presentation slot (to be distributed in
the class)