data mining by : tung, sze ming ( leo ) cs 157b. definition a class of database application that...
TRANSCRIPT
![Page 1: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/1.jpg)
Data MiningData Mining
By : Tung, Sze Ming ( Leo )By : Tung, Sze Ming ( Leo )
CS 157BCS 157B
![Page 2: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/2.jpg)
DefinitionDefinition
A class of database application that A class of database application that analyze analyze data in a database using tools which look fodata in a database using tools which look for trends or anomalies. r trends or anomalies.
Data mining was invented by IBM.Data mining was invented by IBM.
![Page 3: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/3.jpg)
PurposePurpose
To look for hidden patterns or previously unknTo look for hidden patterns or previously unknown relationships among the data in a group of own relationships among the data in a group of data that can be used to predict future behavior.data that can be used to predict future behavior.
Ex: Data mining software can help retail compEx: Data mining software can help retail companies find customers with common interests.anies find customers with common interests.
![Page 4: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/4.jpg)
Background InformationBackground Information
Many of the techniques used by today's data mMany of the techniques used by today's data mining tools have been around for many years, hining tools have been around for many years, having originated in the artificial intelligence reaving originated in the artificial intelligence research of the 1980s and early 1990s. search of the 1980s and early 1990s.
Data Mining tools are only now being applied Data Mining tools are only now being applied to large-scale database systems. to large-scale database systems.
![Page 5: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/5.jpg)
The Need for Data MiningThe Need for Data Mining
The amount of raw data stored in corporate datThe amount of raw data stored in corporate data warehouses is growing rapidly. a warehouses is growing rapidly.
There is too much data and complexity that miThere is too much data and complexity that might be relevant to a specific problem. ght be relevant to a specific problem.
Data mining promises to bridge the analytical Data mining promises to bridge the analytical gap by giving knowledgeworkers the tools to ngap by giving knowledgeworkers the tools to navigate this complex analytical space. avigate this complex analytical space.
![Page 6: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/6.jpg)
The Need for Data Mining, cont’The Need for Data Mining, cont’
The need for information has resulted in the prThe need for information has resulted in the proliferation of data warehouses that integrate inoliferation of data warehouses that integrate information multiple sources to support decision formation multiple sources to support decision making. making.
Often include data from external sources, such Often include data from external sources, such as customer demographics and household inforas customer demographics and household information. mation.
![Page 7: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/7.jpg)
Approach to Data MiningApproach to Data Mining
association association sequence-based analysis sequence-based analysis clustering clustering classification classification
![Page 8: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/8.jpg)
AssociationAssociation
Classic market-basket analysis, which treats the purchClassic market-basket analysis, which treats the purchase of a number of items (for example, the contents of ase of a number of items (for example, the contents of a shopping basket) as a single transaction. a shopping basket) as a single transaction.
This information can be used to adjust inventories, mThis information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted proodify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move smotional activities to increase overall sales or move s
pecific products.pecific products. Example : 80 percent of all transactions in which beer Example : 80 percent of all transactions in which beer
was purchased also included potato chips.was purchased also included potato chips.
![Page 9: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/9.jpg)
Sequence-based analysisSequence-based analysis
Traditional market-basket analysis deals with a Traditional market-basket analysis deals with a collection of items as part of a point-in-time trcollection of items as part of a point-in-time transaction. ansaction.
to identify a typical set of purchases that might to identify a typical set of purchases that might predict the subsequent purchase of a specific itpredict the subsequent purchase of a specific item. em.
![Page 10: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/10.jpg)
ClusteringClustering
Clustering approach address segmentation probleClustering approach address segmentation problems. ms.
These approaches assign records with a large numThese approaches assign records with a large number of attributes into a relatively small set of grouber of attributes into a relatively small set of groups or "segments." ps or "segments."
Example : Buying habits of multiple population seExample : Buying habits of multiple population segments might be compared to determine which segments might be compared to determine which segments to target for a new sales campaign. gments to target for a new sales campaign.
![Page 11: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/11.jpg)
ClassificationClassification
Most commonly applied data mining techniquMost commonly applied data mining technique e
Algorithm uses preclassified examples to deterAlgorithm uses preclassified examples to determine the set of parameters required for proper mine the set of parameters required for proper discrimination. discrimination.
Example : A classifier derived from the ClassifExample : A classifier derived from the Classification approach is capable of identifying risky ication approach is capable of identifying risky loans, could be used to aid in the decision of wloans, could be used to aid in the decision of whether to grant a loan to an individual. hether to grant a loan to an individual.
![Page 12: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/12.jpg)
Issues of Data MiningIssues of Data Mining
Present-day tools are strong but require Present-day tools are strong but require significant expertise to implement effectively. significant expertise to implement effectively.
Issues of Data MiningIssues of Data Mining Susceptibility to "dirty" or irrelevant data.Susceptibility to "dirty" or irrelevant data. Inability to "explain" results in human terms.Inability to "explain" results in human terms.
![Page 13: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/13.jpg)
IssuesIssues
susceptibility to "dirty" or irrelevant data susceptibility to "dirty" or irrelevant data Data mining tools of today simply take everything Data mining tools of today simply take everything
they are given as factual and draw the resulting cothey are given as factual and draw the resulting conclusions. nclusions.
Users must take the necessary precautions to ensurUsers must take the necessary precautions to ensure that the data being analyzed is "clean." e that the data being analyzed is "clean."
![Page 14: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/14.jpg)
Issues, cont’Issues, cont’
inability to "explain" results in human terms inability to "explain" results in human terms Many of the tools employed in data mining Many of the tools employed in data mining
analysis use complex mathematical algorithms that analysis use complex mathematical algorithms that are not easily mapped into human terms.are not easily mapped into human terms.
what good does the information do if you don’t what good does the information do if you don’t understand it?understand it?
![Page 15: Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for](https://reader036.vdocument.in/reader036/viewer/2022082506/56649ecb5503460f94bd9735/html5/thumbnails/15.jpg)
The EndThe End