lift chart (analysis services - data mining)

Upload: vlaresearch

Post on 05-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Lift Chart (Analysis Services - Data Mining)

    1/5

    http://msdn.microsoft.com/en-us/library/ms175428

    Lift Chart (Analysis Services - Data Mining)SQL Server 2008 R2

    You can view different types of charts in the Lift Chart tab of the Mining Accuracy Chart tab ofData Mining Designer, depending on the model that you select, the predictable attribute in the model,

    and other settings.

    If your model predicts a discrete value, you can create a lift chart or profit chart. A lift chart compares

    the accuracy of the predictions of each model, and can be configured to show accuracy for predictionsin general, or for predictions of a specific value. A profit chart is a related chart type that contains the

    same information as a lift chart, but also displays the projected increase in profit that is associatedwith using each model. Use the Chart Type list to select the type of chart you want.

    Note You cannot display time series models in a lift chart or profit chart, but you can view a chartthat contains both the historical series and predictions based on the series by using the MiningModel Prediction tab. For more information, see Microsoft Time Series Algorithm.

    For More Information:Profit Chart (Analysis Services - Data Mining), Scatter Plot (Analysis Services- Data Mining)

    Scenario

    The Lift Chart tab displays a graphical representation of the change in lift that a mining modelcauses. For example, the marketing department at Adventure Works Cycles wants to create a

    targeted mailing campaign. From past campaigns, they know that a 10 percent response rate istypical. They have a list of 10,000 potential customers stored in a table in the database. Therefore,

    based on the typical response rate, they can expect 1,000 of the potential customers to respond.

    However, the money budgeted for the project is not enough to reach all 10,000 customers in thedatabase. Based on the budget, they can afford to mail an advertisement to only 5,000 customers.The marketing department has two choices:

    Randomly select 5,000 customers to target

    Use a mining model to target the 5,000 customers who are most likely to respond

    If the company randomly selects 5,000 customers, they can expect to receive only 500 responses,based on the typical response rate. This scenario is what the random line in the lift chart represents.

    However, if the marketing department uses a mining model to target their mailing, they can expect alarger response rate because they can target those customers who are most likely to respond. If the

    model is perfect, it means that the model creates predictions that are never wrong, and the company

    could expect to receive 1,000 responses by mailing to the 1,000 potential customers recommendedby the model. This scenario is what the ideal line in the lift chart represents. The reality is that themining model most likely falls between these two extremes; between a random guess and a perfect

    prediction. Any improvement from the random guess is considered to be lift.

    Understanding the Lift Chart

    You can create two types of lift charts: one in which you specify a target value for the predictablecolumn, and one in which you do not specify the value. When you switch between the InputSelection tab and the Lift Chart tab, the chart is updated to reflect any changes that you made inthe column mappings or other settings.

    Lift Chart with Target Value

    The following chart shows a lift chart for the Targeted Mailing model that you create in the BasicData Mining Tutorial. In this chart, the target attribute is [Bike Buyer] and the target value is 1,

    1

    http://msdn.microsoft.com/en-us/library/ms175428http://msdn.microsoft.com/en-us/library/ms174923http://msdn.microsoft.com/en-us/library/cc645870http://msdn.microsoft.com/en-us/library/bb895169http://msdn.microsoft.com/en-us/library/bb895169http://void%280%29/http://void%280%29/http://msdn.microsoft.com/en-us/library/ms167167http://msdn.microsoft.com/en-us/library/ms167167http://msdn.microsoft.com/en-us/library/ms175428http://msdn.microsoft.com/en-us/library/ms174923http://msdn.microsoft.com/en-us/library/cc645870http://msdn.microsoft.com/en-us/library/bb895169http://msdn.microsoft.com/en-us/library/bb895169http://void%280%29/http://void%280%29/http://msdn.microsoft.com/en-us/library/ms167167http://msdn.microsoft.com/en-us/library/ms167167
  • 7/31/2019 Lift Chart (Analysis Services - Data Mining)

    2/5

    meaning that the customer purchased a bike or is likely to do so. The lift chart thus shows theimprovement the model provides when identifying customers who are likely to buy a bike.

    In addition to the basic model, the chart includes a related model that has been filtered to targetspecific customers. You can add multiple models to a lift chart, as long as the models all have the

    same predictable attribute. This filter restricts the cases used in both training and evaluation tocustomers who are under the age of 30. As a result, the number of cases that the model is evaluated

    against differs for the basic model and the filtered model. This point is important to remember whenyou interpret the prediction results and other statistics.

    The x-axis of the chart represents the percentage of the test dataset that is used to compare thepredictions. The y-axis of the chart represents the percentage of predicted values.

    The diagonal straight line, shown here in blue, appears in every chart. It represents the results of

    random guessing, and is the baseline against which to evaluate lift. For each model that you add to a

    lift chart, you get two additional lines: one line shows the ideal results for the training data set if youcould create a model that always predicted perfectly, and the second line shows the actual lift, or

    improvement in results, for the model.

    In this example, the ideal line for the filtered model is shown in dark blue, and the line for actual liftin yellow. You can tell from the chart that the ideal line peaks at around 40 percent, meaning that if

    you had a perfect model, you could reach 100 percent of your targeted customers by sending amailing to only 40% of the total population. The actual lift for the filtered model when you target 40

    percent of the population is between 60 and 70 percent, meaning you could reach 60-70 percent ofyour targeted customers by sending the mailing to 40 percent of the total customer population.

    The Mining Legend contains the actual values at any point on the curves. You can change the placethat is measured by clicking the vertical gray bar and moving it. In the chart, the gray line has beenmoved to 30 percent, because this is the point where both the filtered and unfiltered models appear

    to be most effective, and after this point the amount of lift declines.

    The Mining Legend also contains scores and statistics that help you interpret the chart. Theseresults represent the accuracy of the model at the gray line, which in this scenario is positioned toinclude 30 percent of the overall test cases.

    Series, model Score Target population Predict probability

    2

  • 7/31/2019 Lift Chart (Analysis Services - Data Mining)

    3/5

    Targeted mailing all 0.71 47.40% 61.38%

    Targeted mailing under 30 0.85 51.81% 46.62%

    Random guess model 31.00%

    Ideal model for: Targeted mailing all 62.48%

    Ideal model for: Targeted mailing under 30 65.28%

    From these results, you can see that, when measured at 30 percent of all cases, the general model(Targeted mailing all) can predict the bike buying behavior of 47.40% of the target population. In

    other words, if you sent out a targeted mailing to only 30 percent of the customers in your database,

    you could reach slightly less than half of your target audience. If you used the filtered model, youcould reach about 51 percent of your targeted customers.

    The value for Predict probability represents the threshold required to include a customer among the"likely to buy" cases. For each case, the model estimates the accuracy of each prediction and stores

    that value, which you can use to filter out or to target customers. For example, to identify the

    customers from the basic model who are likely buyers, you would use a query to retrieve cases with aPredict probability of at least 61 percent. To get the customers targeted by the filtered model, youwould create query that retrieved cases that met all the criteria: age and a PredictProbability valueof at least 46 percent.

    It is interesting to compare the models. The filtered model appears to capture more potential

    customers, but when you target customers with a prediction probability score of 46 percent, you alsohave a 53 percent chance of sending a mailing to someone who will not buy a bike. Therefore, if you

    were deciding which model is better, you would want to balance the greater precision and smallertarget size of the filtered model against the selectiveness of the basic model.

    The value for Score helps you compare models by calculating the effectiveness of the model across anormalized population. A higher score is better, so in this case you might decide that targeting

    customers under 30 is the most effective strategy, despite the lower prediction probability.

    Lift Chart for Model with No Target Value

    If you do not specify the state of the predictable column, you create the type of chart shown in the

    following diagram. This chart shows how the model performs for all states of the predictable attribute.

    For example, this chart would tell you how well the model predicts both customers who are likely tobuy a bike, and those who are unlikely to buy a bike.

    The x-axis is the same as in the chart with the predictable column specified, but the y-axis now

    represents the percentage of predictions that are correct. Therefore, the ideal line is the diagonal line,

    which shows that at 50 percent of the data, the model correctly predicts 50% of the cases, themaximum that can be expected.

    3

  • 7/31/2019 Lift Chart (Analysis Services - Data Mining)

    4/5

    You can click in the chart to move the vertical gray bar, and the Mining Legend displays thepercentage of cases overall, and the percentage of cases that were predicted correctly. For example,

    if you position the gray slider bar at the 50 percent mark, the Mining Legend displays the following

    accuracy scores. These figures are based on the TM_Decision Tree model created in the Basic DataMining Tutorial.

    Series, model Score Target population Predict probability

    TM_Decision Tree 0.77 40.50% 72.91%

    Ideal model 50.00%

    This table tells you that, at 50 percent of the population, the model that you created correctly predicts40 percent of the cases. You might consider this a reasonably accurate model. However, remember

    that this particular model predicts all values of the predictable attribute. Therefore, the model might

    be accurate in predicting that 90 percent of customers will not buy a bike.

    Note

    The prediction accuracy for all discrete values of the predictable attribute is shown in a single line. If

    you want to see prediction accuracy lines for any individual value of the predictable attribute, you

    must create a separate lift chart for that value.

    Back to Top

    Creating a Lift Chart

    The Basic Data Mining Tutorial includes a walkthrough of how to create a lift chart for the Targeted

    Mailing model. For more information, seeTesting Accuracy with Lift Charts (Basic Data MiningTutorial).

    For a step-by-step procedure that applies to all chart types, see How to: Create an Accuracy Chart fora Mining Model.

    See Also

    Concepts

    Validating Data Mining Models (Analysis Services - Data Mining)

    Profit Chart (Analysis Services - Data Mining)Classification Matrix (Analysis Services - Data Mining)

    Scatter Plot (Analysis Services - Data Mining)

    Cross-Validation Report (Analysis Services - Data Mining)

    Other Resources

    4

    http://msdn.microsoft.com/en-us/library/ms175428#BKMK_ChartTypes%23BKMK_ChartTypeshttp://msdn.microsoft.com/en-us/library/ms167167http://msdn.microsoft.com/en-us/library/ms170238http://msdn.microsoft.com/en-us/library/ms170238http://msdn.microsoft.com/en-us/library/ms170238http://msdn.microsoft.com/en-us/library/ms175360http://msdn.microsoft.com/en-us/library/ms175360http://void%280%29/http://msdn.microsoft.com/en-us/library/ms174493http://msdn.microsoft.com/en-us/library/cc645870http://msdn.microsoft.com/en-us/library/ms174811http://msdn.microsoft.com/en-us/library/bb895169http://msdn.microsoft.com/en-us/library/bb895177http://msdn.microsoft.com/en-us/library/ms175428#BKMK_ChartTypes%23BKMK_ChartTypeshttp://msdn.microsoft.com/en-us/library/ms167167http://msdn.microsoft.com/en-us/library/ms170238http://msdn.microsoft.com/en-us/library/ms170238http://msdn.microsoft.com/en-us/library/ms175360http://msdn.microsoft.com/en-us/library/ms175360http://void%280%29/http://msdn.microsoft.com/en-us/library/ms174493http://msdn.microsoft.com/en-us/library/cc645870http://msdn.microsoft.com/en-us/library/ms174811http://msdn.microsoft.com/en-us/library/bb895169http://msdn.microsoft.com/en-us/library/bb895177
  • 7/31/2019 Lift Chart (Analysis Services - Data Mining)

    5/5

    Mining Accuracy Chart Tab: How-to Topics

    5

    http://msdn.microsoft.com/en-us/library/ms174767http://msdn.microsoft.com/en-us/library/ms174767