datamining process 17.03.12

Upload: sweetvision

Post on 05-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 DataMining Process 17.03.12

    1/24

    Data Mining & its

    Process

    03/17/2012Advanced Database ManagementSystems 1

  • 7/31/2019 DataMining Process 17.03.12

    2/24

    Contents

    Data Mining Definition

    Data Mining Process

    Data Mining Process Steps

    Data Mining Tools

    03/17/2012Advanced Database ManagementSystems 2

  • 7/31/2019 DataMining Process 17.03.12

    3/24

    Data Mining

    03/17/2012Advanced Database ManagementSystems 3

  • 7/31/2019 DataMining Process 17.03.12

    4/24

    Data Mining

    a process of discovering actionable information from large sets

    of data.

    uses mathematical analysis to derive patterns and trends that

    exist in data. these patterns cannot be discovered by traditional data

    exploration because the relationships are too complex or

    because there is too much data.

    These patterns and trends can be collected and defined as adata mining model.

    03/17/2012Advanced Database ManagementSystems 4

  • 7/31/2019 DataMining Process 17.03.12

    5/24

    Applications to business scenarios

    Mining models can be applied to specific business scenarios,

    such as:

    Forecasting sales

    Targeting mailings toward specific customers

    Determining which products are likely to be sold together

    Finding sequences in the order that customers add products to

    a shopping cart

    03/17/2012Advanced Database ManagementSystems 5

  • 7/31/2019 DataMining Process 17.03.12

    6/24

    Data mining process

    Six steps:

    1. Defining the Problem

    2. Preparing Data

    3. Exploring Data

    4. Building Models

    5. Exploring and Validating Models6. Deploying and Updating Models

    03/17/2012Advanced Database ManagementSystems 6

  • 7/31/2019 DataMining Process 17.03.12

    7/24

    Relationship between each step

    03/17/2012Advanced Database ManagementSystems 7

  • 7/31/2019 DataMining Process 17.03.12

    8/24

    Explaination

    Each step does not necessarily lead directly to thenext step.

    Creating a data mining model is a dynamic anditerative process. After exploring the data, it may be found that the data is

    insufficient to create the appropriate mining models andtherefore more data have to be looked.

    After building several models, if it is realized that themodels do not adequately answer the problem defined andtherefore must redefine the problem.

    The models may have to be updated after they have been

    deployed because more data has become available. Each step in the process might need to be

    repeated many times in order to create a goodmodel.

    03/17/2012Advanced Database ManagementSystems 8

  • 7/31/2019 DataMining Process 17.03.12

    9/24

  • 7/31/2019 DataMining Process 17.03.12

    10/24

    Defining the Problem

    03/17/2012Advanced Database ManagementSystems 10

    analyzing the business requirements

    consider ways to provide an answer to the problem

    defining the scope of the problem

    defining the metrics by which the model will beevaluated, and

    defining specific objectives for the data mining project

  • 7/31/2019 DataMining Process 17.03.12

    11/24

    The Tasks

    What are you looking for? What types of relationships are you trying to find? Does the problem you are trying to solve reflect the policies or processes of the

    business?

    Do you want to make predictions from the data mining model, or just look forinteresting patterns and associations?

    Which attribute of the dataset do you want to try to predict?

    How are the columns related? If there are multiple tables, how are the tablesrelated?

    How is the data distributed? Is the data seasonal? Does the data accuratelyrepresent the processes of the business?

    Answer to the questions: A data availability study have to be conducted to

    investigate the needs of the business users with regard to the available data. If thedata does not support the needs of the users, the project might have to be redefined.

    03/17/2012Advanced Database ManagementSystems 11

  • 7/31/2019 DataMining Process 17.03.12

    12/24

    Phase-II: Preparing Data

    03/17/2012Advanced Database ManagementSystems 12

  • 7/31/2019 DataMining Process 17.03.12

    13/24

    Preparing Data

    Removes inconsistencies such as incorrect or missingentries. For example, the data might show that a customer bought a

    product before the product was offered on the market, or that thecustomer shops regularly at a store located 2,000 miles from herhome.

    Finds hidden correlations in the data

    Identifies sources of data that are the most accurate and

    Determine which columns are the most appropriate foruse in analysis.

    For example, Should you use the shipping date or the order date?

    Is the best sales influencer the quantity, total price, or adiscounted price?

    Therefore, before starting to build mining models, these

    problems should be identified and determined how to fixthem. 03/17/2012Advanced Database ManagementSystems 13

  • 7/31/2019 DataMining Process 17.03.12

    14/24

    Phase-III: Exploring Data

    03/17/2012Advanced Database ManagementSystems 14

  • 7/31/2019 DataMining Process 17.03.12

    15/24

    Exploring Data

    Exploration techniques include

    calculating the minimum and maximum values,

    calculating mean and standard deviations, and

    looking at the distribution of the data.

    For example:

    By reviewing the maximum, minimum, and mean values it can be

    determined that the data is not representative of customers or businessprocesses, and therefore must obtain more balanced data.

    Standard deviations and other distribution values can provide usefulinformation about the stability and accuracy of the results. A largestandard deviation can indicate that adding more data might help improvethe model.

    Exploring the data helps better understanding of the business problem in deciding if the dataset contains flawed data, and then a strategy

    for fixing the problems can be devised to gain a deeper understanding of the behaviors that are typical of

    your business

    03/17/2012Advanced Database ManagementSystems 15

  • 7/31/2019 DataMining Process 17.03.12

    16/24

    03/17/2012Advanced Database ManagementSystems 16

    Phase IV: Building Models

  • 7/31/2019 DataMining Process 17.03.12

    17/24

    Building Models

    A mining structure is created to define the data explored in theprevious phase. It defines the source of data but does not containany data until it is processed.

    Processing a model is called Training. In this, specific mathematicalalgorithms are applied to the data in the structure to extract patterns.

    The patterns that found in the training process depend on theselection of training data, the algorithm chosen, and how thealgorithm has been configured.

    Whenever the data changes, both the mining structure and themining model must be updated . When a mining structure is updatedby reprocessing it, data is retrieved from the source, including any

    new data, and repopulates the mining structure.The mining model are retrained on the new data.

    03/17/2012Advanced Database ManagementSystems 17

  • 7/31/2019 DataMining Process 17.03.12

    18/24

    Phase V: Validating Models

    03/17/2012Advanced Database ManagementSystems 18

  • 7/31/2019 DataMining Process 17.03.12

    19/24

    Validating Models

    Before a model is deployed into a production environment, it is tested forhow well the model performs. All the models created with differentconfigurations are tested to see which yields the best results for thespecified problem and data.

    Analysis Services provides tools that help to separate data into training and

    testing datasets so that one can accurately assess the performance of allmodels on the same data.

    The training dataset is used to build the model, and the testing dataset totest the accuracy of the model by creating prediction queries.

    What if none of the models that created in the Building Models stepperform well?

    return to a previous step in the process and redefine the problem orreinvestigate the data in the original dataset.

    03/17/2012Advanced Database ManagementSystems 19

  • 7/31/2019 DataMining Process 17.03.12

    20/24

    Phase VI: Deploying and Updating

    Models

    03/17/2012Advanced Database ManagementSystems 20

  • 7/31/2019 DataMining Process 17.03.12

    21/24

    Deploying and Updating Models Deploy the models that performed the best to a production environment.

    After the mining models exist in a production environment, various tasks can be

    performed, depending on ones needs. The following are some of the tasks you canperform:

    Use the models to create predictions, which you can then use to make businessdecisions.

    Create queries to retrieve statistics, rules, or formulas from the model.

    Embed data mining functionality directly into an application. You can includeAnalysis Management Objects (AMO), which contains a set of objects that your

    application can use to create, alter, process, and delete mining structures andmining models.

    Use Integration Services to create a package in which a mining model is used tointelligently separate incoming data into multiple tables. For example, if a databaseis continually updated with potential customers, you could use a mining modeltogether with Integration Services to split the incoming data into customers who arelikely to purchase a product and customers who are likely to not purchase aproduct.

    Create a report that lets users directly query against an existing mining model.

    Update the models after review and analysis. Any update requires that youreprocess the models.

    Update the models dynamically, as more data comes into the organization, andmaking constant changes to improve the effectiveness of the solution should be partof the deployment strategy.

    03/17/2012Advanced Database ManagementSystems 21

  • 7/31/2019 DataMining Process 17.03.12

    22/24

    Data-Mining Tools

    Some of the Commercially and publicly available tools are: DataEngine

    AgentBase/Marketeer

    BusinessMiner

    CART

    Data Surveyor

    Data Mining Suite

    DataMind

    IBM Datajoiner

    Kensington 2000, etc

    For the latest tools and their performance visit sites:http://www.kdnuggets.com and http://www.knowledgestorm.com.

    03/17/2012Advanced Database ManagementSystems 22

    http://www.kdnuggets.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.knowledgestorm.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/http://www.kdnuggets.com/
  • 7/31/2019 DataMining Process 17.03.12

    23/24

    References

    Data Mining Explained: Rhoda Delmater & MonteHancock

    http://findarticles.com/p/articles/mi_m0BRZ/is_9_19/ai_57778455/

    http://www.springer.com/cda/content/document/cda

    _downloaddocument/9780387333335-c2.pdf?SGWID=0-0-45-424299-p173660317

    http://matwbn.icm.edu.pl/ksiazki/amc/amc11/amc1133.pdf

    http://findarticles.com/p/articles/mi_m0BRZ/is_9_19

    /ai_57778455/

    03/17/2012Advanced Database ManagementSystems 23

  • 7/31/2019 DataMining Process 17.03.12

    24/24

    03/17/2012Advanced Database ManagementSystems 24