knowledge discovery centre: cityu-sas partnership 1 speakers: prof y v hui, cityu dr h p lo, cityu...
Post on 22-Dec-2015
234 views
TRANSCRIPT
1Knowledge Discovery Centre: CityU-SAS Partnership
Speakers:
Prof Y V Hui, CityUDr H P Lo, CityU
Dr Sammy Yuen, CityUDr K W Cheng, SAS Institute
Mr Steven Parker, Standard Chartered
Knowledge Discovery Centre: CityU-SAS Partnership 2
The Art and Science of Data Mining
Y V HuiCity University of Hong Kong
Knowledge Discovery Centre: CityU-SAS Partnership 3
The Driving Forces
• Specialization and focus in business- To satisfy the needs of customers- To improve and develop specific business strategies and processes- Personalization through mass customization
Knowledge Discovery Centre: CityU-SAS Partnership 4
The Driving Forces
• Challenges- local and global competition- distributed business operations- product innovation
• Technology development• Benefit, cost and risk on a product
or customer basis
Knowledge Discovery Centre: CityU-SAS Partnership 5
Data Mining• Also known as knowledge discovery in
databases. Data mining digs out valuable information from large and messy data. (Computer scientist’s definition)
• Data mining is a knowledge discovery process. It’s the integration of business knowledge, people, information, statistics and computing technology.
Knowledge Discovery Centre: CityU-SAS Partnership 6
Data Mining is Hot
• Ten Hottest Job, Time, 22 May, 2000
• 10 emerging areas of technology, MIT’s Magazine of Technology Review, Jan/Feb, 2001
Knowledge Discovery Centre: CityU-SAS Partnership 7
Data Mining Philosophy
• A powerful enabler of competitive advantage.
• Data mining is driven from business knowledge.
• Data mining is about enabling people to discover actionable information about their business.
• Return of profit isn’t about algorithms
Knowledge Discovery Centre: CityU-SAS Partnership 8
Business outlookIndustry conditions
Product offeringCustomer analysisStrategic options
Competitive actionsetc
Problemdevelopment
and management
Reporting and evaluations
Project designData collection and
preparationModel building
Validation
Management’sDecision World Interface
Data Miner’sAnalytical World
Scope of Data Mining
Knowledge Discovery Centre: CityU-SAS Partnership 9
Project Management
• Cross-functional team• System architecture
Knowledge Discovery Centre: CityU-SAS Partnership 10
Successful applications
• Business transaction- risks and opportunities
• Customer relationship management- personalization, target marketing
• Electronic commerce & web- web mining
Knowledge Discovery Centre: CityU-SAS Partnership 11
Successful applications
• Science & engineering• Health care• Multi-media• Others
Knowledge Discovery Centre: CityU-SAS Partnership 12
Data Mining Process
Understanding of businessProblem identification
Knowledge Discovery Centre: CityU-SAS Partnership 13
Understanding Your Business• Do we have a problem?
- What is the current situation? Are there any undesirable situations that need attention?- Are there any conditions, processes, etc, that could be improved?- Are any problems foreseeable that could affect the business?- Are there any potential opportunities that the company may capitalize on? A problem is a learning opportunity
Knowledge Discovery Centre: CityU-SAS Partnership 14
Understanding Your Problem
• Operational or analytical• Convention rule or knowledge
discovery• Product based or customer based• Market research or data mining• Ownership of the information• Privacy• Added value
Knowledge Discovery Centre: CityU-SAS Partnership 15
Data Mining Process
Collecting relevant information
Understanding of businessProblem identification
Knowledge Discovery Centre: CityU-SAS Partnership 16
Collecting Relevant Information
• Data Search• Data Collection• Data Preparation• Data Mining Database
Knowledge Discovery Centre: CityU-SAS Partnership 17
Data Search
• Exploring the problem space.Don’t let the data drive the problem.
• Measurement• Exploring the data sources
Knowledge Discovery Centre: CityU-SAS Partnership 18
Data Collection
• Data retrieval• Data audit• Data set assembly and data
warehouse• Survey
Knowledge Discovery Centre: CityU-SAS Partnership 19
Data Preparation
• Data representation• Data exploration• Data normalization• Data transformation• Imputation of missing data• Data tuning
Knowledge Discovery Centre: CityU-SAS Partnership 20
Data Mining Database
• Variable selection• Record selection• Data set partition
Knowledge Discovery Centre: CityU-SAS Partnership 21
Data Mining Process
Collecting relevant information Model building
Understanding of businessProblem identification
Learning
Knowledge Discovery Centre: CityU-SAS Partnership 22
Model Building
• Model based vs non-model basedy1,y2,…,yp=f(x1, …, xq)
x1, …, xqy1, …, yp
Inputs Outputs
Knowledge Discovery Centre: CityU-SAS Partnership 24
Model Building
• Estimation vs trial and error• Directed vs undirected• Multidimensional analysis• Large data set vs small data set
Knowledge Discovery Centre: CityU-SAS Partnership 25
Data Mining Algorithms
Online AnalyticalProcessing
Discovery Driven Methods
SQL Query ToolsDescription Prediction
Classification Regressions
Decision Trees
Neural Networks
Visualization
Clustering
Association
Sequential Analysis
Knowledge Discovery Centre: CityU-SAS Partnership 26
Online Analytical Processing• Query and reporting
Example of SQL query:How many credit-card customers who
made purchases of over $1,000 on sporting goods in December have at least $20,000 of available credit?
• Manual and validation driven
Knowledge Discovery Centre: CityU-SAS Partnership 27
Estimation and Prediction
• Statistical models• Neural network
Example:Housing price valuation model
Knowledge Discovery Centre: CityU-SAS Partnership 28
Classification Algorithms
• Statistical techniques• Neural networks• Genetic algorithms• Nearest neighbor method• Rule induction and decision tree
Example: Customer segmentation and buying behavior description
Knowledge Discovery Centre: CityU-SAS Partnership 29
Association Rules
• Apriori algorithm
Example:Market basket analysis, cross selling
analysis
Knowledge Discovery Centre: CityU-SAS Partnership 30
Sequential Analysis
• Count-all algorithm• Count-some algorithm
Example:Attached mailing, add-on sales
Knowledge Discovery Centre: CityU-SAS Partnership 31
Algorithms Comparison• No single data mining algorithm can
outperform any other.Try different algorithms and draw conclusions from the results. Use your business knowledge.
• Neural networks do no better than statistical models when the underlying structure is known. However, neural networks detect hidden interactions and nonlinearity. Use the prior information if available.
Knowledge Discovery Centre: CityU-SAS Partnership 32
Algorithms Comparison
• Data mining algorithms cannot handle dependent records.Use the prior information. Statistical models help.
• Data tuning and dimension reduction enhance data mining before and after the analysis.Statistical techniques help.
Knowledge Discovery Centre: CityU-SAS Partnership 33
Data Mining Process
Collecting relevant data Model building
Understanding of businessProblem identification
Business strategyand evaluation
Learning
Action
Knowledge Discovery Centre: CityU-SAS Partnership 34
Trends that Effect Data Mining
• Data trends- data explosion- data types
Knowledge Discovery Centre: CityU-SAS Partnership 35
Trends that Effect Data Mining
• Hardware trends- memory- processing speed- storage
Knowledge Discovery Centre: CityU-SAS Partnership 36
Trends that Effect Data Mining
• Network trends- network connectivity- distributed databases
• Wireless communication
Knowledge Discovery Centre: CityU-SAS Partnership 37
Trends that Effect Data Mining
• Scientific computing trends- theory, experiment and simulation
Knowledge Discovery Centre: CityU-SAS Partnership 38
Trends that Effect Data Mining• Business trends
- total quality management,- customer relationship management,- business process reengineering, - enterprise resources planning,- supply chain management,- business intelligence and knowledge management,- e – business and m – business
Knowledge Discovery Centre: CityU-SAS Partnership 39
Trends that Effect Data Mining
• Privacy and Security
Knowledge Discovery Centre: CityU-SAS Partnership 40
Pot of Gold• The benefits of knowing one’s
business and customers become so critical that technologies are coming together to support data mining.
• Data mining is not a cybernetic magic that will turn your data into gold. It’s the process and result of knowledge production, knowledge discovery and knowledge management.