action rules slides by a a tzacheva.. 2 knowledge discovery the nontrivial process of identifying...

29
ACTION RULES Slides by Slides by A A Tzacheva. A A Tzacheva.

Upload: nancy-reed

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

ACTION RULES

Slides bySlides by A A Tzacheva. A A Tzacheva.

2

Knowledge Discovery

The nontrivial process of identifying The nontrivial process of identifying validvalid, , novel,novel, potentially potentially useful,useful, and ultimately and ultimately understandableunderstandable patterns in data. patterns in data.

(Fayyad, et al 1996)(Fayyad, et al 1996)

Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories.

Gartner Group

“The Saying that Knowledge Is Power Is Not Quite True… Used Knowledge Is Power”

Edward E. Free

3

Knowledge Discovery of Databases Knowledge Discovery of Databases (KDD) is a new area of research that (KDD) is a new area of research that combines many algorithms and techniques combines many algorithms and techniques used in artificial intelligence, statistics, used in artificial intelligence, statistics, databases, machine learning, etc.databases, machine learning, etc.

KDD is the process of extracting KDD is the process of extracting previously unknown, not obvious, new, previously unknown, not obvious, new, and interesting information from huge and interesting information from huge amount of dataamount of data

Past research on data mining has mostly Past research on data mining has mostly been focused on techniques for generating been focused on techniques for generating rules from datasetsrules from datasets

Knowledge Discovery

4

Knowledge Discovery

Data Warehouse

Prepareddata

Data

CleaningIntegration

SelectionTransformation

DataMining

Patterns

EvaluationVisualization

KnowledgeKnowledge

Base

5

Knowledge Discovery

[Pohle, 2003] Many data mining systems are great in deriving usefulstatistics and patterns from huge amounts of data, butthey are not very smart in interpreting these results,which is crucial for turning them into interesting,understandable and actionable knowledge.

Lack of sophisticated tool support for incorporatinghuman domain knowledge into the mining process. Thisdomain knowledge should be updated with the miningresults.

Mining Process (Fayyad): [[Business Understanding] [Domain Knowledge]] [Data Understanding] [Data Preparation] [Modeling/Mining] [Evaluation] [Deployment]

6

AssociationsAssociations: two conditions occur together, with : two conditions occur together, with some confidencesome confidence

Presumptive Objective

E = [Cond1 Cond2]

Interestingness Function

Data Mining Task:For a given dataset D, language of facts L, interestingness function ID, L and threshold c, find association E such that ID,L(E) > c efficiently. Knowledge Engineer defines c

7

The rules discovered by data mining algorithm are large and we want a subset of The rules discovered by data mining algorithm are large and we want a subset of rules, which are interesting, because these algorithms discover accurate rules rules, which are interesting, because these algorithms discover accurate rules rather than interesting rules.rather than interesting rules.

Association is interesting if it is easily understood by humans, valid on new or test

data with some degree of certainty, potentially useful, novel, or validates some

hypothesis that a user seeks to confirm .

There are two aspects of rules’ interestingness that have been studied in data There are two aspects of rules’ interestingness that have been studied in data mining literature, mining literature, objectiveobjective and and subjectivesubjective measures measures

Objective measuresObjective measures are data-driven and domain-independent. Generally, these are data-driven and domain-independent. Generally, these measures evaluate the rules based on the quality as well as the similarity between measures evaluate the rules based on the quality as well as the similarity between them, rather than considering the user belief about the domain.them, rather than considering the user belief about the domain. * Note: Domain here is meant in a sense of the type of data – ex. financial data * Note: Domain here is meant in a sense of the type of data – ex. financial data (means financial domain), medical data (means medical domain), therefore if the (means financial domain), medical data (means medical domain), therefore if the measures being calculated are measures being calculated are independentindependent of the domain, then they can always be of the domain, then they can always be calculated - no matter if the data is financial, medical or any other type.calculated - no matter if the data is financial, medical or any other type.

Are All the “Discovered” Patterns Interesting?

8

Objective Measure Examples

Assume is an association ruleSome objective measures are:

Support or Strength: card[ ]

Confidence or Certainty Factor: card[]/card[]

Coverage Factor: card[]/card[]

Leverage: card[]/n – [card[]/n]*[card[]/n]

Lift: n card[]/[card[]*card[]]

9

Problem: Subjective Interestingness

Rule is:Rule is: unexpectedunexpected, if it contradicts the user belief about , if it contradicts the user belief about

the domain and therefore surprises the userthe domain and therefore surprises the user novelnovel, if to some extent contributes to new , if to some extent contributes to new

knowledgeknowledge actionableactionable, if the user can take an action to , if the user can take an action to

his/her advantage based on this rulehis/her advantage based on this rule

In the data mining literature the In the data mining literature the actionability has has been quantified in terms of unexpectedness. For been quantified in terms of unexpectedness. For example:example:

- the most of actionable knowledge is unexpected. - the most of unexpected knowledge is actionable.

10

So, if we are able to calculate the actionability of a rule (like the way we are able to calculate support and confidence) then, we have a way of knowing whether our rule is unexpected and interesting (we have a way to measure the interestingness of the rule)

In the data mining literature, data mining is viewed as the process of turning data into information, information into action, and action into value or profit.

However, the task of finding actionable rules is not trivial. As actionability is seen as an elusive concept because it is difficult to know the space of all rules and the actions to be attached to them.

Actionability - Subjective

11

Argument: Objective Unexpectedness

Unexpectedness/does not depend on domain knowledge/If r = [A B1] has a high confidence and r1 = [A*C B2] has a high confidence, then r1 is unexpected.

Unexpectedness is inherently subjective and prior beliefs of the user form its important component.

[Padmanabhan & Tuzhilin]A B is unexpected with respect to the belief on the dataset D if the following conditions hold: B = False [ B and logically contradict each other] A happen together on a large subset of D A* B is true, which means A*

12

The The actionabilityactionability measure is based on the rules’ benefit to measure is based on the rules’ benefit to the user, that is, the user can do something to his/her the user, that is, the user can do something to his/her interest with the ruleinterest with the rule. .

This measure is very important for the rules to be This measure is very important for the rules to be interesting in the sense that the users always are looking interesting in the sense that the users always are looking for patterns to improve their performance and establishing for patterns to improve their performance and establishing better work. better work.

The practical implication of getting information is to The practical implication of getting information is to improve the business, that is, the information must ensure improve the business, that is, the information must ensure the success of business for decision-making. Actions can the success of business for decision-making. Actions can be performed to make the business succeed. be performed to make the business succeed.

Actionability

13

ActionableActionable rule mining deals with benefit- rule mining deals with benefit-driven actions required for decision driven actions required for decision making making

Rules are unexpected if they "surprise" Rules are unexpected if they "surprise" the user, and rules are the user, and rules are actionableactionable if the if the user can do something with them to user can do something with them to his/her advantage his/her advantage

For example, a user may be able to For example, a user may be able to change the nondesirable/non-profitable change the nondesirable/non-profitable patterns to desirable/profitable patternspatterns to desirable/profitable patterns

There are methods which define There are methods which define actionabilityactionability as an approximation of as an approximation of unexpectedness. unexpectedness.

In order to produce unexpected In order to produce unexpected and/or actionable rules, the system and/or actionable rules, the system must know what the user expects, must know what the user expects, i.e., his/her existing knowledge or i.e., his/her existing knowledge or concepts about the domain.concepts about the domain.

Machine learning also typically Machine learning also typically assumes that the domain knowledge assumes that the domain knowledge is correct or at least partially is correct or at least partially correct.correct.

Actionable Rules and Action Rules

Although both unexpectedness and Although both unexpectedness and actionabilityactionability are important, are important, actionabilityactionability is is the key concept in most applications because the key concept in most applications because actionableactionable rules allow the user to rules allow the user to do his/her job better by taking some specific actions in response to the do his/her job better by taking some specific actions in response to the discovered knowledge.discovered knowledge.

14

Next, we will focus on a special type of rules, called Next, we will focus on a special type of rules, called actionaction rules, rules, which are which are actionableactionable rules. We will study a well-defined algorithm for rules. We will study a well-defined algorithm for discovering such rules.discovering such rules.

These rules can be constructed from classification rules to suggest a These rules can be constructed from classification rules to suggest a way to way to re-classify objectsre-classify objects (for instance customers, or patients) to a (for instance customers, or patients) to a desired state.desired state.

In e-commerce applications, this re-classification may mean that a In e-commerce applications, this re-classification may mean that a consumer not interested in a certain product, now may buy it, and consumer not interested in a certain product, now may buy it, and therefore may fall into a group of therefore may fall into a group of more profitablemore profitable customers. customers. In In medical domain, this re-classification may mean how to change the medical domain, this re-classification may mean how to change the class of a tumor from malignant to benign.class of a tumor from malignant to benign.

Action RulesAction Rules

15

These groups are described by These groups are described by values of classification values of classification attributes in a decision table attributes in a decision table schema.schema. By a decision table we By a decision table we mean any mean any information systeminformation system where the set of attributes is where the set of attributes is partitioned into partitioned into conditionsconditions and and decisionsdecisions. .

To discover action rules it is To discover action rules it is required that the set of required that the set of conditions is partitioned into conditions is partitioned into stablestable conditions and conditions and flexibleflexible conditions/attributes. For conditions/attributes. For simplicity reason, we also simplicity reason, we also assume that there is only assume that there is only one one decisiondecision attribute. attribute.

Action Rules For example, For example, date of birthdate of birth is a is a

stablestable attribute, and attribute, and interest interest raterate on any customer account is on any customer account is a a flexibleflexible attribute (dependable attribute (dependable on bank). on bank).

The assumption that the The assumption that the decision decision attribute attribute dd is flexible is flexible is is quite essential. quite essential.

Action Rules

16

Decision table

Any information system S of the form Any information system S of the form S = ( AS = ( AFlFl A AStSt {d} ), where {d} ), where d is a distinguished attribute called decision. d is a distinguished attribute called decision. the elements of Athe elements of AStSt are called stable conditions are called stable conditions the elements of Athe elements of AFlFl {d} are called flexible {d} are called flexible

conditionsconditions

Example of action ruleExample of action rule::

[ ([ (bb11, , vv11 w w11) ) ( (bb22,, v v22 w w22) ) … … ( (bbpp,, v vpp w wpp)](x) )](x)

[([(dd,, k k11 kk22)](x) )](x)

This means that, if we change the value of attribute This means that, if we change the value of attribute bb1 1

from from vv1 1 to to w w1 1 ,and the value of attribute ,and the value of attribute bb2 2 from from vv2 2 to to w w2 2 ,and so on, and the value of attribute ,and so on, and the value of attribute bbp p from from vvp p to to w wp p , then the value of the decision attribute , then the value of the decision attribute dd, will , will change from change from k k11 to the desired value to the desired value k k2 .2 .

Action Rules

Assumption: (i)[(1 i p) (bi AFl)] – in other words, the attributes b1, b2, …, bp are all flexible attributes

* Note: the objects are the rows of the database (decision table), and the attributes are the columns of the database.

17

X  a b c d

x1 0 S 0 L

x2 0 R 1 L

x3 0 S 1 L

x4 0 R 1 L

x5 2 P 2 L

x6 2 P 2 L

x7 2 S 2 H

{a, c} - stable attributes, {a, c} - stable attributes,

{b,d}{b,d} - - flexibleflexible attributes, attributes,

dd - decision attribute. - decision attribute.

(its values are (its values are LL – Low profitability – Low profitability customer, and customer, and HH – High – High profitability customer)profitability customer)

Action Rules

Rules discovered: Rules discovered: rr11 = = [ [ (b, P) (d, L)]

rr22 = = [(a, 2) ^[(a, 2) ^ (b, S) (d, H)]

Decision Table(r1, r2)- action rule:[(b, P S)](x) [(d, L H)](x)

18

Practical Examples

Next, we see some action rules extracted Next, we see some action rules extracted from 3 different databases – 2 in medical from 3 different databases – 2 in medical domain, and 1 in financial domain:domain, and 1 in financial domain:

Binding to thrombin databaseBinding to thrombin database Insurance company benchmark Insurance company benchmark

databasedatabase Breast cancer databaseBreast cancer database

However, we first need to introduce one However, we first need to introduce one more notation – the more notation – the cost of an action rulecost of an action rule

19

Usually, there is a Usually, there is a costcost (monetary or moral) association with (monetary or moral) association with undertaking some kind of an action. For example: undertaking some kind of an action. For example:

decreasing the interest rate on a customer account (re-decreasing the interest rate on a customer account (re-classifying the customer from one interest rate group to classifying the customer from one interest rate group to another) may cost us mailing a letter to them, and doing another) may cost us mailing a letter to them, and doing some internal administration to the account, say $5. some internal administration to the account, say $5.

relocating an employee from one city to another (re-relocating an employee from one city to another (re-classifying the employee from one division to another) classifying the employee from one division to another) may cost us the moving expense, in addition to a moral may cost us the moving expense, in addition to a moral cost – some negative emotions of the employee about it cost – some negative emotions of the employee about it may influence his/her future performance or perception of may influence his/her future performance or perception of our organization.our organization.

We will denote the cost with We will denote the cost with - a number from 0 to +- a number from 0 to + . . The cost will be close to 0 if the action is trivial (very easy The cost will be close to 0 if the action is trivial (very easy to accomplish) and the cost will be close to plus infinity +to accomplish) and the cost will be close to plus infinity + if the action is very difficulty (almost impossible) to if the action is very difficulty (almost impossible) to accomplish.accomplish.

Cost of Action Rule

20

Assumption: If we have an information system S, S If we have an information system S, S = = (X, A, V) , where X are the objects, A are the (X, A, V) , where X are the objects, A are the attributes, and V are the values of the attributes, attributes, and V are the values of the attributes, assume attribute b assume attribute b A is flexible, and b A is flexible, and b11, b, b22 V Vb b (b(b11 and band b22 are some of the values of b). are some of the values of b).

By By S(X, b1, b2) we mean a number from (0, + we mean a number from (0, +] which] whichdescribes the average describes the average predicted cost of approved actionpredicted cost of approved action associated with a possible re-classification of qualifying associated with a possible re-classification of qualifying objects X from class bobjects X from class b11 to class b to class b22. .

Object X qualifies for re-classification from bObject X qualifies for re-classification from b11 to b to b22, if, ifb(X) = bb(X) = b1 1 (currently the value of b is b (currently the value of b is b1 1 for that object)for that object)

Cost of Action Rule

21

Cost of Action Rule

Action rule Action rule r: : [(b[(b11, v, v11→ w→ w11) ) (b (b22, v, v22→ w→ w22) ) … … ( b( bpp, v, vpp→ w→ wpp)](x) )](x) (d, k (d, k11→ k→ k22)(x) )(x)

The cost of the left hand side of the rulethe left hand side of the rule r (costLeft) equal to thesum of the costs of the terms listed in the left hand side::

costLeft = = {{SS((vvii , w , wii) : 1 ) : 1 i i p} p}

Action rule Action rule r r is is feasible in Sfeasible in S, if , if costLeft < < SS(k(k11 , k , k22). ).

For any feasible action rule For any feasible action rule rr, the cost of the conditional , the cost of the conditional PartPart (left hand side) of (left hand side) of r r is lower than the cost of its decision is lower than the cost of its decisionpart (right hand side, where the decision attribute is listed)part (right hand side, where the decision attribute is listed)

22

Binding to Thrombin Database

The first database, is the The first database, is the Binding to Thrombin databaseBinding to Thrombin database, , is used for drug design, and provided in the KDD Cup is used for drug design, and provided in the KDD Cup 2001 Competition. 2001 Competition.

DrugsDrugs are typically small organic molecules that are typically small organic molecules that achieve achieve their desired activity by bindingtheir desired activity by binding to a target site on a to a target site on a receptor. The first step in the discovery of a new drug is receptor. The first step in the discovery of a new drug is usually to usually to identify and isolate the receptor to which it identify and isolate the receptor to which it should bindshould bind, followed by testing many small molecules , followed by testing many small molecules for their for their ability to bindability to bind to the target site. to the target site.

This leaves researchers with the task of This leaves researchers with the task of determining determining what separates the activewhat separates the active (binding) compounds (binding) compounds from the from the inactiveinactive (non-binding) ones. Such a determination can (non-binding) ones. Such a determination can then be used in the then be used in the design of new compoundsdesign of new compounds that not that not only bind, but also have all the other properties required only bind, but also have all the other properties required for a drug (solubility, oral absorption, lack of side effects, for a drug (solubility, oral absorption, lack of side effects, appropriate duration of action, toxicity, etc.). appropriate duration of action, toxicity, etc.).

23

Binding to Thrombin Database The data set consists of 1909 compounds The data set consists of 1909 compounds (the objects (the objects

/rows in the database) /rows in the database) tested for their ability to bind to a tested for their ability to bind to a target site on target site on thrombinthrombin, a , a key receptor in blood clottingkey receptor in blood clotting. .

Each compound is described by binary featuresEach compound is described by binary features (the (the attributes / columns in the database)attributes / columns in the database), which describe , which describe three-dimensional properties of the moleculethree-dimensional properties of the molecule.. Biological Biological activity in general, and receptor binding affinity in activity in general, and receptor binding affinity in particular, correlate with various structural and physical particular, correlate with various structural and physical properties of small organic molecules. The taskproperties of small organic molecules. The task with KDD with KDD Cup 2001 wasCup 2001 was to determine which of these properties to determine which of these properties are critical in this case and to learn to accurately predict are critical in this case and to learn to accurately predict the class valuethe class value: : Active or InactiveActive or Inactive. .

In this testing In this testing we use the class attributewe use the class attribute, which has , which has value A for active and I for inactivevalue A for active and I for inactive, as the , as the re-re-classification attributeclassification attribute for the actionRules. In this way, for the actionRules. In this way, we provide we provide suggestions to the user to what molecular suggestions to the user to what molecular properties can be changedproperties can be changed in order to in order to reclassify the reclassify the chemical compoundchemical compound from from inactive to activeinactive to active class class, in , in order to bind to thrombin. order to bind to thrombin.

24

Binding to Thrombin Database The following results were found with The following results were found with

LowestCostReclassifier (software for extracting action rules LowestCostReclassifier (software for extracting action rules of lowest cost) :of lowest cost) :decisionAttribute = activityvalueFrom = 0valueTo = 1

dataFile= thrombinnumberOfObjects: 1908

minConfidenceL1 = 0.65minFeasibilityL3 = 0.0001knownCost = 0.3maxCostL2 = 0.01

---- Goal Node:(f304, 1->0 | 0.12327) => (activity, 0->1 | 0.3) 1(f172, 0->1 | 0.00472118) => (f304, 1->0 | 0.12327) 0.998424

---- Action Rule of Min Cost Found: ----(f172, 0->1 | 0.00472118) => (activity, 0->1 | 0.3) 0.998424

25

Insurance Company Benchmark database

The next database used is in the The next database used is in the financial financial domaindomain, the , the Insurance CompanyInsurance Company Benchmark (COIL 2000) database used Benchmark (COIL 2000) database used with the CoIL 2000 Challenge. with the CoIL 2000 Challenge.

The data contains The data contains 5,822 tuples 5,822 tuples (the customers /rows in (the customers /rows in the database)the database). The features . The features (the attributes / columns in (the attributes / columns in the database)the database) include include product usage data and socio-product usage data and socio-demographic datademographic data derived from zip area codes. derived from zip area codes.

The data was supplied by the Dutch data mining The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a company Sentient Machine Research and is based on a real world business problem. real world business problem.

In In our testingour testing the user would like to the user would like to reclassify the reclassify the attribute attribute Contribution car policiesContribution car policies from a value of from a value of 5 to 5 to 66. .

26

Insurance Company Benchmark database

decisionAttribute = Contribution car policiesvalueFrom = 5valueTo = 6

dataFile= InsurancenumberOfObjects: 5822

minConfidenceL1 = 0.72minFeasibilityL3 = 0.0001knownCost = 0.4maxCostL2 = 0.02

---- Goal Node:(Private health insurance, 3->4 | 0.02423844) => (Contribution car policies, 5->6 | 0.4) 0.833333(High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Private health insurance, 3->4 | 0.3) 0.714286

---- Action Rule of Min Cost Found: ----

(High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Contribution car policies, 5->6 | 0.4) 0.714286

The following results were found with The following results were found with LowestCostReclassifier :LowestCostReclassifier :

27

Breast Cancer Database

Time

Another database used is a Another database used is a breast breast cancer databasecancer database. It was obtained . It was obtained at the University of Wisconsin at the University of Wisconsin Hospitals, Madison from Dr. Hospitals, Madison from Dr. William H. Wolberg. William H. Wolberg.

It contains a It contains a class attributeclass attribute which classifies the which classifies the tumor as tumor as benign or malignantbenign or malignant. The rest of the attributes (columns . The rest of the attributes (columns in the database) contain descriptions of common factors in the database) contain descriptions of common factors radiologistsradiologists, and , and pathologistspathologists examine in order to place examine in order to place the diagnosis, such as Clump Thickness, Uniformity of the diagnosis, such as Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, etc. Cell Shape, Bare Nuclei, etc.

The database has 700 instances The database has 700 instances (the objects /rows in the (the objects /rows in the database)database), Benign: 458 (65.5%) and Malignant: 241 , Benign: 458 (65.5%) and Malignant: 241 (34.5%). (34.5%).

The The class attributeclass attribute is used as the is used as the re-classification re-classification attribute with LowestCostReclassifierattribute with LowestCostReclassifier. This way, we will . This way, we will provide suggestionsprovide suggestions/actions to be undertaken /actions to be undertaken in order to in order to change the class from malignant to benignchange the class from malignant to benign. .

28

Breast Cancer Database

decisionAttribute = Class valueFrom = 4valueTo = 2

dataFile= brcancernumberOfObjects: 699

minConfidenceL1 = 0.7minFeasibilityL3 = 0.0001knownCost = 0.3maxCostL2 = 0.00059 ..

.

!!!!!!!List of sifted action rules is emptyi.e. no feasible action rules were found, sothis state/child has no descendents:(Marginal Adhesion, 1->3 | 0.00128449)

@@@@@@@@ Traversing next nodeThe list Q of nodes traversed was empty.

The following results were found with The following results were found with LowestCostReclassifier :LowestCostReclassifier :

29

Breast Cancer Database

---- Best Node ----(Uniformity of Cell Size, 8->1 | 0.00179706) ^ (Normal Nucleoli, 5->3 | 0.00721622) => (Class, 4->2 | 0.3) 1(Marginal Adhesion, 1->3 | 0.00128449) => (Uniformity of Cell Size, 8->1 | 0.00179706) 0.85

---- Action Rule of Min Cost Found: ----(Marginal Adhesion, 1->3 | 0.00128449) => (Class, 4->2 | 0.3) 0.85

In this case, the In this case, the goal could not be goal could not be reachedreached, possibly because the desired , possibly because the desired maximum cost specified, maximum cost specified, 0.000590.00059 was was too low. However, too low. However, still the best node still the best node found thus far was returnedfound thus far was returned, which cost , which cost 0.00120.001284484499 is is still lowerstill lower that the that the currently known costcurrently known cost to the user to the user 0.30.3. .