1 computer science and decision making fred roberts, rutgers university

175
1 Computer Science and Decision Making Fred Roberts, Rutgers University

Post on 22-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

1

Computer Science and Decision Making

Fred Roberts, Rutgers University

Page 2: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

2

Computer Science and Decision Making

•Many recent applications in CS involve issues/problems of long interest in decision theory:

social choice or consensuspreference, utilityconflict and cooperationallocationincentivesmeasurement

•Methods developed in decision theory, often in the social sciences, beginning to be used in CS

Page 3: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

3

CS and DM •CS applications place great strain on SS-DM methods

Sheer size of problems addressedComputational power of agents an issueLimitations on information possessed by playersSequential nature of repeated applications

•Thus: Need for new generation of SS-DM methods

•Also: These new methods will provide powerful tools to social scientists/decision makers.

Page 4: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

4

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 5: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

5

Consensus Rankings: Social Choice • Relevant social science problems: voting, group

decision making• Goal: based on everyone’s opinions, reach a “consensus” • Typical opinions expressed as:

“first choice”ranking of all alternativesscores classifications

• Long history of research on such problems.

Page 6: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

6

Consensus Rankings Background: Arrow’s Impossibility Theorem: • There is no “consensus method” that satisfies

certain reasonable axioms about how societies should reach decisions.

• Input to Arrow’s Theorem: rankings of alternatives (ties allowed).• Output: consensus ranking.

Kenneth ArrowNobel prize winner

Page 7: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

7

Consensus Rankings

• There are widely studied and widely used consensus methods (that violate one or

more of Arrow’s conditions).

• One well-known consensus method: “Kemeny-Snell medians”: Given set of rankings, find ranking minimizing sum of distances to other rankings.

• Kemeny-Snell medians are having surprising new applications in CS.

John Kemeny,pioneer in time sharingin CS

Page 8: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

8

Consensus Rankings• Kemeny-Snell distance between rankings: twice the

number of pairs of candidates i and j for which i is ranked above j in one ranking and below j in the other + the number of pairs that are ranked in one ranking and tied in another.

a bx y-zy xzOn {x,y}: +2On {x,z}: +2On {y,z}: +1d(a,b) = 5.

Page 9: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

9

Consensus Rankings

• Kemeny-Snell median: Given rankings a1, a2, …, ap, find a ranking x so that

d(a1,x) + d(a2,x) + … + d(ap,x) is minimized.

• x can be a ranking other than a1, a2, …, ap.

• Sometimes just called Kemeny median.

Page 10: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

10

Consensus Rankings a1 a2 a3

Fish Fish ChickenChicken Chicken FishBeef Beef Beef

• Median = a1.

• If x = a1:d(a1,x) + d(a2,x) + d(a3,x) = 0 + 0 + 2 = 2

is minimized.• If x = a3, the sum is 4.• For any other x, the sum is at least 1 + 1 + 1 = 3.

Page 11: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

11

Consensus Rankings a1 a2 a3

Fish Chicken BeefChicken Beef FishBeef Fish Chicken

• Three medians = a1, a2, a3.

• This is the “voter’s paradox” situation.

Page 12: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

12

Consensus Rankings a1 a2 a3

Fish Chicken BeefChicken Beef FishBeef Fish Chicken

• Note that sometimes we wish to minimize

d(a1,x)2 + d(a2,x)2 + … + d(ap,x)2

• A ranking x that minimizes this is called a Kemeny-Snell mean.

• In this example, there is one mean: the ranking declaring all three alternatives tied.

Page 13: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

13

Consensus Rankings a1 a2 a3

Fish Chicken BeefChicken Beef FishBeef Fish Chicken

• If x is the ranking declaring Fish, Chicken and Beef tied, then

d(a1,x)2 + d(a2,x)2 + … + d(ap,x)2 = 32 + 32 + 32 = 27.

• Not hard to show this is minimum.

Page 14: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

14

Consensus Rankings

Theorem (Bartholdi, Tovey, and Trick, 1989; Wakabayashi, 1986): Computing the Kemeny median of a set of rankings is an NP-complete problem.

Page 15: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

15

Consensus Rankings

Okay, so what does this have to do with practical computer science questions?

Page 16: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

16

Consensus Rankings

I mean really practical computer science questions.

Page 17: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

17

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 18: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

18

Meta-search and Collaborative Filtering Meta-search

• A consensus problem• Combine page rankings from several search

engines• Dwork, Kumar, Naor, Sivakumar (2000):

Kemeny-Snell medians good in spam resistance in meta-search (spam by a page if it causes meta-search to rank it too highly)

• Approximation methods make this computationally tractable

Page 19: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

19

Meta-search and Collaborative Filtering

Collaborative Filtering

• Recommending books or movies• Combine book or movie ratings• Produce ordered list of books or movies to

recommend• Freund, Iyer, Schapire, Singer (2003):

“Boosting” algorithm for combining rankings.• Related topic: Recommender Systems

Page 20: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

20

Meta-search and Collaborative Filtering

A major difference from SS-DM applications:

• In SS-DM applications, number of voters is large, number of candidates is small.

• In CS applications, number of voters (search engines) is small, number of candidates (pages) is large.

• This makes for major new complications and research challenges.

Page 21: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

21

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 22: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

22

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

• Successful group decision making requires efficient elicitation of information and efficient representation of the information elicited.

• Old problems in the social sciences.• Computational aspects becoming a focal point

because of need to deal with massive and complex information.

Page 23: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

23

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

• Example I: Preferences are key components in decision making applications.

• “I prefer beef to fish”

• Extracting and representing preferences is key in group decision making applications.

Page 24: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

24

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

• “Brute force” approach: For every pair of alternatives, ask which is preferred to the other.

• Often computationally infeasible.

Page 25: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

25

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

• In many applications (e.g., collaborative filtering), important to elicit preferences automatically.

• CP-nets introduced as tool to represent preferences succinctly and provide ways to make inferences about preferences (Boutilier, Brafman, Doomshlak, Hoos, Poole 2004).

Page 26: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

26

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

• Example II: combinatorial auctions.• Auctions increasingly used in business and

government.• Information technology allows complex auctions with huge number of bidders.• There are key decision makingproblems arising in auctions.

Page 27: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

27

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation• Bidding functions maximizing expected profit

can be exceedingly difficult to compute.• Determining the winner of an auction can be

extremely hard. (Rothkopf, Pekec, Harstad 1998)

Page 28: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

28

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

Combinatorial Auctions

• Multiple goods auctioned off.• Submit bids for combinations of goods.• This leads to NP-complete allocation problems.• Might not even be able to feasibly express all

possible preferences for all subsets of goods.• Rothkopf, Pekec, Harstad (1998): determining

winner is computationally tractable for many economically interesting kinds of combinations.

Page 29: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

29

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation

Combinatorial Auctions

• Decision maker needs to elicit preferences from all agents for all plausible combinations of items in the auction.

• Similar problem arises in optimal bundling of goods and services.

• Elicitation requires exponentially many queries in general.

Page 30: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

30

Computational Approaches to Information Management in Group Decision Making

Representation and Elicitation• Challenge: Recognize situations in which

efficient elicitation and representation is possible.

• One result: Fishburn, Pekec, Reeds (2002)• Even more complicated: When objects in

auction have complex structure. • Problem arises in:

Legal reasoning, sequential decision making, automatic decision devices, collaborative filtering.

Page 31: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

31

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 32: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

32

Large Databases and Inference • Decision makers consult massive data sets.• Real data often in form of sequences.• Example: Bioinformatics.• GenBank has over 7 million sequences

comprising 8.6 billion bases. • The search for similarity or patterns has

extended from pairs of sequences to finding patterns that appear in common in a large number of sequences or throughout the database: “consensus sequences”

• Emerging field of “Bioconsensus”: applies SS-DM consensus methods to biological databases.

Page 33: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

33

Large Databases and Inference

Why look for such patterns?

Similarities between sequences or parts of sequences lead to the discovery of shared phenomena.

For example, it was discovered that the sequence for platelet derived factor, which causes growth in the body, is 87% identical to the sequence for v-sis, a cancer-causing gene. This led to the discovery that v-sis works by stimulating growth.

Page 34: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

34

Large Databases and Inference Example

Bacterial Promoter Sequences studied by Waterman (1989):

RRNABP1: ACTCCCTATAATGCGCCATNAA: GAGTGTAATAATGTAGCCUVRBP2: TTATCCAGTATAATTTGTSFC: AAGCGGTGTTATAATGCC

Notice that if we are looking for patterns of length 4, each sequence has the pattern TAAT.

Page 35: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

35

Large Databases and Inference Example

Bacterial Promoter Sequences studied by Waterman (1989):

RRNABP1: ACTCCCTATAATGCGCCATNAA: GAGTGTAATAATGTAGCCUVRBP2: TTATCCAGTATAATTTGTSFC: AAGCGGTGTTATAATGCC

Notice that if we are looking for patterns of length 4, each sequence has the pattern TAAT.

Page 36: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

36

Large Databases and Inference Example

However, suppose that we add another sequence:

M1 RNA: AACCCTCTATACTGCGCG

The pattern TAAT does not appear here.However, it almost appears, since the pattern

TACT appears, and this has only one mismatch from the pattern TAAT.

Page 37: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

37

Large Databases and Inference Example

However, suppose that we add another sequence:

M1 RNA: AACCCTCTATACTGCGCG

The pattern TAAT does not appear here.However, it almost appears, since the pattern

TACT appears, and this has only one mismatch from the pattern TAAT.

So, in some sense, the pattern TAAT is a good consensus pattern.

Page 38: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

38

Large Databases and Inference Example

We make this precise using best mismatch distance.

Consider two sequences a and b with b longer than a.

Then d(a,b) is the smallest number of mismatches in all possible alignments of a as a consecutive subsequence of b.

Page 39: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

39

Large Databases and Inference Example

a = 0011, b = 111010

Possible Alignments:111010111010 1110100011 0011 0011

The best-mismatch distance is 2, which is achieved in the third alignment.

Page 40: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

40

Large Databases and Inference Smith-Waterman

•Let be a finite alphabet of size at least 2 and be a finite collection of words of length L on . •Let F() be the set of words of length k 2 that are our consensus patterns. (Assume L k.)•Let = {a1, a2, …, an}. •One way to define F() is as follows. •Let d(a,b) be the best-mismatch distance. •Consider nonnegative parameters sd that are monotone decreasing with d and let F(a1,a2, …, an) be all those words w of length k that maximize

S(w) = isd(w,ai)

Page 41: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

41

Large Databases and Inference•We call such an F a Smith-Waterman consensus.•In particular, Waterman and others use the parameters

sd = (k-d)/k.

Example:

•An alphabet used frequently is the purine/pyrimidine alphabet {R,Y}, where R = A (adenine) or G (guanine) and Y = C (cytosine) or T (thymine). •For simplicity, it is easier to use the digits 0,1 rather than the letters R,Y.

•Thus, let = {0,1}, let k = 2. Then the possible pattern words are 00, 01, 10, 11.

Page 42: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

42

Large Databases and Inference•Suppose a1 = 111010, a2 = 111111. How do we find F(a1,a2)?•We have:

d(00,a1) = 1, d(00,a2) = 2d(01,a1) = 0, d(01,a2) = 1d(10,a1) = 0, d(10,a2) = 1d(11,a1) = 0, d(11,a2) = 0

S(00) = sd(00,ai) = s1 + s2,

S(01) = sd(01,ai) = s0 + s1

S(10) = sd(10,ai) = s0 + s1

S(11) = sd(11,ai) = s0 + s0

•As long as s0 > s1 > s2, it follows that 11 is the consensus pattern, according to Smith-Waterman’s consensus.

Page 43: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

43

Example:•Let ={0,1}, k = 3, and consider F(a1,a2,a3) where a1 = 000000, a2 = 100000, a3 = 111110. The possible pattern words are: 000, 001, 010, 011, 100, 101, 110, 111.

d(000,a1) = 0, d(000,a2) = 0, d(000,a3) = 2,d(001,a1) = 1, d(001,a2) = 1, d(001,a3) = 2,

d(100,a1) = 1, d(100,a2) = 0, d(100,a3) = 1, etc.

S(000) = s2 + 2s0, S(001) = s2 + 2s1, S(100) = 2s1 + s0, etc.

•Now, s0 > s1 > s2 implies that S(000) > S(001). •Similarly, one shows that the score is maximized by S(000) or S(100). • Monotonicity doesn’t say which of these is highest.

Page 44: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

44

Large Databases and Inference

The Special Case sd = (k-d)/k

•Then it is easy to show that the words w that maximize s(w) are exactly the words w that minimize

id(w,ai).

•In other words: In this case, the Smith-Waterman consensus is exactly the median.

Algorithms for computing consensus sequences such as Smith-Waterman are important in modern molecular biology.

Page 45: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

45

Large Databases and Inference Other Topics in “Bioconsensus”

• Alternative phylogenies (evolutionary trees) are produced using different methods and we need to choose a consensus tree.

• Alternative taxonomies (classifications) are produced using different models and we need to choose a consensus taxonomy.

• Alternative molecular sequences are produced using different criteria or different algorithms and we need to choose a consensus sequence.

• Alternative sequence alignments are produced and we need to choose a consensus alignment.

Page 46: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

46

Large Databases and Inference

Other Topics in “Bioconsensus”

• Several recent books on bioconsensus.• Day and McMorris [2003]• Janowitz, et al. [2003]

• Bibliography compiled by Bill Day: In molecular biology alone, hundreds of papers using consensus methods in biology.

• Large database problems in CS are being approached using methods of “bioconsensus” having their origin in SS-DM.

Page 47: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

47

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 48: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

48

Consensus Computing, Image Processing • Old SS-DM problem: Dynamic modeling of

how individuals change opinions over time, eventually reaching consensus.

• Often use dynamic models on graphs• Related to neural nets.

• CS applications: distributed computing.• Values of processors in a network are updated

until all have same value.

Page 49: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

49

Consensus Computing, Image Processing • CS application: Noise removal in digital images• Does a pixel level represent noise?• Compare neighboring pixels.• If values beyond threshold, replace pixel value

with mean or median of values of neighbors.• Related application in distributed computing.• Values of faulty processors are replaced by those

of neighboring non-faulty ones.• Berman and Garay (1993) use “parliamentary

procedure” called cloture

Page 50: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

50

Consensus Computing, Image Processing • Side comment: same models are being applied in

“computational and mathematical epidemiology”.

• Modeling the spread of disease through large social networks.

SARSMeasles

Page 51: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

51

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 52: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

52

Computational Intractability of Consensus Functions

• How hard is it to compute the winner of an election?

• We know counting votes can be difficult and time consuming. • However:

• Bartholdi, Tovey and Trick (1989): There are voting schemes where it can be computationally intractable to determine who won an election.

Page 53: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

53

Computational Intractability of Consensus Functions

• So, is computational intractability necessarily bad?

• Computational intractability can be a good thing in an election: Designing voting systems where it is computationally intractable to “manipulate” the outcome of an election by “insincere voting”:Adding votersDeclaring voters ineligibleAdding candidatesDeclaring candidates ineligible

Page 54: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

54

Computational Intractability of Consensus Functions

• Given a set A of all possible candidates and a set I of all possible voters.

• Suppose we know voter i’s ranking of all candidates in A, for every voter i.

• Given a subset of I of eligible voters, a particular candidate a in A, and a number k, is there a set of at most k ineligible voters who can be declared eligible so that candidate a is the winner?

• Bartholdi, Tovey, Trick (1989): For some consensus functions (voting rules), this is an NP-complete problem.

Page 55: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

55

Computational Intractability of Consensus Functions

• Given a set A of all possible candidates and a set I of all possible voters.

• Suppose we know voter i’s ranking of all candidates in A, for every voter i.

• Given a subset of I of eligible voters, a particular candidate a in A, and a number k, is there a set of at most k eligible voters who can be declared ineligible so that candidate a is the winner?

• Bartholdi, Tovey, Trick (1989): For some consensus functions (voting rules), this is an NP-complete problem.

Page 56: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

56

Computational Intractability of Consensus Functions

• Software agents may be more likely to manipulate than individuals (Conitzer and Sandholm 2002):Humans don’t think about manipulatingComputation can be tedious.Software agents are good at running

algorithmsSoftware agents only need to have code for

manipulation written once.All the more reason to develop

computational barriers to manipulation.

Page 57: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

57

Computational Intractability of Consensus Functions

• Stopping those software agents:

Page 58: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

58

Computational Intractability of Consensus Functions

• Conitzer and Sandholm (2002):Earlier results of difficulty of manipulation depend

on large number of candidatesNew results: manipulation possible with some

voting methods if smaller number (bounded number) of candidates)

In weighted voting, voters may have different numbers of votes (as in US presidential elections, where different states (= voters) have different numbers of votes). Here, manipulation is harder.Manipulation difficult when uncertainty about

others’ votes.

Page 59: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

59

Computational Intractability of Consensus Functions

• Conitzer and Sandholm (2006):Try to find voting rules for which

manipulation is usually hard.Why is this difficult to do?One explanation: under one reasonable

assumption, it is impossible to find such rules.

Page 60: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

60

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 61: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

61

Aside: Electronic Voting

• Issues:CorrectnessAnonymityAvailabilitySecurityPrivacy

Page 62: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

62

Electronic VotingSecurity Risks in Electronic Voting

• Threat of “denial of service attacks”• Threat of penetration attacks involving a

delivery mechanism to transport a malicious payload to target host (thru Trojan horse or remote control program)

• Private and correct counting of votes• Cryptographic challenges to keep votes private• Relevance of work on secure multiparty

computation

Page 63: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

63

Electronic Voting

Other CS Challenges:

• Resistance to “vote buying”• Development of user-friendly interfaces• Vulnerabilities of communication path between

the voting client (where you vote) and the server (where votes are counted)

• Reliability issues: random hardware and software failures

Page 64: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

64

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 65: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

65

Software & Hardware Measurement • Theory of measurement developed by

mathematical social scientists.• Measurement theory studies ways to combine

scores obtained on different criteria.• A statement involving scales of measurement is considered meaningful if its

truth or falsity is unchanged under acceptable transformations of all scales involved.

• Example: It is meaningful to say that I weigh more than my daughter.

• That is because if it is true in kilograms, then it is also true in pounds, in grams, etc.

Page 66: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

66

Software & Hardware Measurement • Measurement theory has studied what statements you

can make after averaging scores.• Think of averaging as a consensus (DM) method.• One general principle: To say that the average score of

one set of tests is greater than the average score of another set of tests is not meaningful (it is meaningless) under certain conditions.

• This is often the case if the averaging procedure is to take the arithmetic mean: If s(xi) is score of xi, i = 1, 2, …, n, then arithmetic mean is

is(xi)/n.• Long literature on what averaging methods lead to

meaningful conclusions.

Page 67: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

67

Software & Hardware Measurement A widely used method in hardware measurement:

Score a computer system on different benchmarks.

Normalize score relative to performance of one base system

Average normalized scoresPick system with highest average.Fleming and Wallace (1986): Outcome can

depend on choice of base system. Meaningless in sense of measurement theoryLeads to theory of merging normalized scores

Page 68: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

68

Software & Hardware Measurement Hardware Measurement

417 83 66 39,449 772

244 70 153 33,527 368

134 70 135 66,000 369

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

Data from Heath, Comput. Archit. News (1984)

Page 69: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

69

Software & Hardware Measurement Normalize Relative to Processor R

417

1.00

83

1.00

66

1.00

39,449

1.00

772

1.00

244

.59

70

.84

153

2.32

33,527

.85

368

.48

134

.32

70

.85

135

2.05

66,000

1.67

369

.45

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

Page 70: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

70

Software & Hardware Measurement Take Arithmetic Mean of Normalized Scores

417

1.00

83

1.00

66

1.00

39,449

1.00

772

1.00

244

.59

70

.84

153

2.32

33,527

.85

368

.48

134

.32

70

.85

135

2.05

66,000

1.67

369

.45

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

ArithmeticMean

1.00

1.01

1.07

Page 71: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

71

Software & Hardware Measurement Take Arithmetic Mean of Normalized Scores

417

1.00

83

1.00

66

1.00

39,449

1.00

772

1.00

244

.59

70

.84

153

2.32

33,527

.85

368

.48

134

.32

70

.85

135

2.05

66,000

1.67

369

.45

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

ArithmeticMean

1.00

1.01

1.07

Conclude that machine Z is best

Page 72: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

72

Software & Hardware Measurement Now Normalize Relative to Processor M

417

1.71

83

1.19

66

.43

39,449

1.18

772

2.10

244

1.00

70

1.00

153

1.00

33,527

1.00

368

1.00

134

.55

70

1.00

135

.88

66,000

1.97

369

1.00

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

Page 73: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

73

Software & Hardware Measurement Take Arithmetic Mean of Normalized Scores

417

1.71

83

1.19

66

.43

39,449

1.18

772

2.10

244

1.00

70

1.00

153

1.00

33,527

1.00

368

1.00

134

.55

70

1.00

135

.88

66,000

1.97

369

1.00

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

ArithmeticMean

1.32

1.00

1.08

Page 74: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

74

Software & Hardware Measurement Take Arithmetic Mean of Normalized Scores

417

1.71

83

1.19

66

.43

39,449

1.18

772

2.10

244

1.00

70

1.00

153

1.00

33,527

1.00

368

1.00

134

.55

70

1.00

135

.88

66,000

1.97

369

1.00

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

ArithmeticMean

1.32

1.00

1.08

Conclude that machine R is best

Page 75: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

75

Software and Hardware Measurement • So, the conclusion that a given machine is best

by taking arithmetic mean of normalized scores is meaningless in this case.

• Above example from Fleming and Wallace (1986), data from Heath (1984)

• Sometimes, geometric mean is helpful.• Geometric mean is

is(xi)n

Page 76: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

76

Software & Hardware Measurement Normalize Relative to Processor R

417

1.00

83

1.00

66

1.00

39,449

1.00

772

1.00

244

.59

70

.84

153

2.32

33,527

.85

368

.48

134

.32

70

.85

135

2.05

66,000

1.67

369

.45

BENCHMARK

R

M

Z

PROCESSOR

E F G H I

GeometricMean

1.00

.86

.84

Conclude that machine R is best

Page 77: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

77

Software & Hardware Measurement Now Normalize Relative to Processor M

417

1.71

83

1.19

66

.43

39,449

1.18

772

2.10

244

1.00

70

1.00

153

1.00

33,527

1.00

368

1.00

134

.55

70

1.00

135

.88

66,000

1.97

369

1.00

BENCHMARK

R

M

Z

PROCESSOR

E F G H IGeometricMean

1.17

1.00

.99

Still conclude that machine R is best

Page 78: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

78

Software and Hardware Measurement• In this situation, it is easy to show that the conclusion

that a given machine has highest geometric mean normalized score is a meaningful conclusion.

• Even meaningful: A given machine has geometric mean normalized score 20% higher than another machine.

• Fleming and Wallace give general conditions under which comparing geometric means of normalized scores is meaningful.

• Research area: what averaging procedures make sense in what situations? Large literature.

• Note: There are situations where comparing arithmetic means is meaningful but comparing geometric means is not.

Page 79: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

79

Unknown function u = F(a1,a2,…,an)

The values a1, a2, …, an are scores and F is some averaging function

Luce's idea (“Principle of Theory Construction”): If you know the scale types of the ai and the scale type of u and you assume that an admissible transformation of each of the ai leads to an admissible transformation of u, you can derive the form of F.

Admissible transformation: pounds into grams, Fahrenheit into Centigrade. Scale type determined by class of admissible transformations.

Ratio scales, interval scales, …

Technical Aside: How Should We Average Scores?

R. Duncan Luce

Page 80: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

80

Example: a1, a2, …, an are independent ratio scales, u is a ratio scale.

F: (+)n +

F(a1,a2,…,an) = u F(1a1,2a2,…,nan) = u,

1 > 0, 2 > 0, n > 0, > 0, depends on a1, a2, …, an.

•Thus we get the functional equation:

(*) F(1a1,2a2,…,nan) = A(1,2,…,n)F(a1,a2,…,an),

A(1,2,…,n) > 0

How Should We Average Scores?

Page 81: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

81

(*) F(1a1,2a2,…,nan) = A(1,2,…,n)F(a1,a2,…,an),

A(1,2,…,n) > 0

Theorem (Luce 1964): If F: (+)n + is continuous and satisfies (*), then there are > 0, c1, c2, …, cn so that

How Should We Average Scores?

F (a1;a2;:::;an) = ¸ac11 ac22 :::a

cnn :

Page 82: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

82

(Aczél, Roberts, Rosenbaum 1986): It is easy to see that the assumption of continuity can be weakened to continuity at a point, monotonicity, or boundedness on an (arbitrarily small, open) n-dimensional interval or on a set of positive measure. Call any of these assumptions regularity.

How Should We Average Scores?

Janos Aczél“Mr Functional Equations”

Page 83: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

83

Theorem (Aczél and Roberts 1989): If in addition F satisfies reflexivity and symmetry, then = 1 and c1 = c2 = … = cn = 1/n , so F is the geometric mean.

Reflexivity: F(a,a,...,a) = a

Symmetry: F(a1,a2,…,an) = F(a(1),a(2),…,a(n)) for all permutations of {1,2,…,n}

How Should We Average Scores?

Page 84: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

84

Sometimes You Get the Arithmetic Mean

Example: a1, a2, …, an interval scales with the same unit and independent zero points; u an interval scale.

Functional Equation:

(**) F(a1+1,a2+2,…,an+n) = A(,1,2,…,n)F(a1,a2,…,an) + B(,1,2,…,n)

A(,1,2,…,n) > 0

How Should We Average Scores?

Page 85: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

85

Functional Equation:

(**) F(a1+1,a2+2,…,an+n) = A(,1,2,…,n)F(a1,a2,…,an) + B(,1,2,…,n)

A(,1,2,…,n) > 0

Solutions to (**) (Aczél, Roberts, and Rosenbaum 1986):

1, 2, …, n, b arbitrary constants

(no continuity assumptions needed)

How Should We Average Scores?

F (a1;a2;:::;an) =nX

i=1

¸ iai +b

Page 86: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

86

Theorem (Aczél and Roberts 1989):

(1). If in addition F satisfies reflexivity, then

(2). If in addition F satisfies reflexivity and symmetry, then i= 1/n for all i, and b = 0, i.e., F is the arithmetic mean.

How Should We Average Scores?

P ni=1

¸ i =1, b=0:

Page 87: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

87

Software and Hardware Measurement

• Message from measurement theory to computer science (and DM):

Do not perform arithmetic operations on data without paying attention to whether the conclusions you get are meaningful.

Page 88: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

88

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 89: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

89

Power of a VoterShapley-Shubik Power Index

• Think of a “voting game”• Shapley-Shubik index measures the power of each player in a multi-player game.• Consider a game in which some coalitions of players win and some lose, with no subset of a losing coalition winning.

Lloyd Shapley

MartinShubik

Page 90: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

90

Power of a Voter

Shapley-Shubik Power Index• Consider a coalition forming at random, one

player at a time.• A player i is pivotal if addition of i throws

coalition from losing to winning. • Shapley-Shubik index of i = probability i is

pivotal if an order of players is chosen at random.

• Power measure applying to more general games than voting games is called Shapley Value.

Page 91: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

91

Power of a Voter

Example: Shareholders of CompanyShareholder 1 holds 3 shares.Shareholders 2, 3, 4, 5, 6, 7 hold 1 share each.A majority of shares are needed to make a decision.Coalition {1,4,6} is winning.Coalition {2,3,4,5,6} is winning.

Shareholder 1 is pivotal if he is 3rd, 4th, or 5th.So shareholder 1’s Shapley value is 3/7.Sum of Shapley values is 1 (since they are probabilities)Thus, each other shareholder has Shapley value

(4/7)/6 = 2/21

Page 92: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

92

Power of a Voter

Example: United Nations Security Council

•15 member nations•5 permanent members

China, France, Russia, UK, US

•10 non-permanent•Permanent members have veto power•Coalition wins iff it has all 5 permanent membersand at least 4 of the 10 non-permanent members.

Page 93: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

93

Power of a Voter

Example: United Nations Security Council

•What is the power of eachMember of the SecurityCouncil? •Fix non-permanent member i.•i is pivotal in permutations in which all permanent members precede i and exactly 3 non- permanent members do.•How many such permutations are there?

Page 94: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

94

Power of a VoterExample: United Nations Security Council•Choose 3 non-permanent members preceding i. •Order all 8 members preceding i.•Order remaining 6 non-permanent members.•Thus the number of such permutations is:

C(9,3) x 8! x 6! = 9!/3!6! x 8! x 6! = 9!8!/3!•The probability that i is pivotal = power of non-permanent member =

9!8!/3!15! = .001865•The power of a permanent member =

[1 – 10 x .001865]/5 = .1963.•Permanent members have 100 times power of non-permanent members.

Page 95: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

95

Power of a Voter

•There are a variety of other power indices used in game theory and political science (Banzhaf index, Coleman index, …)

•Need calculate them for huge games•Mostly computationally intractable

Page 96: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

96

Power of a Voter: Allocation/Sharing Costs and Revenues

• Shapley-Shubik power index and more general Shapley value have been used to allocate costs to different users in shared projects.Allocating runway fees in airportsAllocating highway fees to trucks of

different sizesUniversities sharing library facilitiesFair allocation of telephone calling

charges among users sharing complex phone systems (Cornell’s experiment)

Page 97: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

97

Power of a Voter: Allocating/Sharing Costs and Revenues

Allocating Runway Fees at Airports• Some planes require longer runways.

Charge them more for landings. How much more?

• Divide runways into meter-long segments.

• Each month, we know how many landings a plane has made.

• Given a runway of length y meters, consider a game in which the players are landings and a coalition “wins” if the runway is not long enough for planes in the coalition.

Page 98: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

98

Power of a Voter: Allocating/Sharing Costs and Revenues

Allocating Runway Fees at Airports• A landing is pivotal if it is the first

landing added that makes a coalition require a longer runway.

• The Shapley value gives the cost of the yth meter of runway allocated to a given landing.

• We then add up these costs over all runway lengths a plane requires and all landings it makes.

Page 99: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

99

Power of a Voter: Allocating/Sharing Costs and Revenues

Multicasting

• Applications in multicasting.• Unicast routing: Each packet sent from a

source is delivered to a single receiver.• Sending it to multiple sites: Send multiple

copies and waste bandwidth.• In multicast routing: Use a directed tree connecting source to all receivers.• At branch points, a packet is duplicated as necessary.

Page 100: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

100

Multicasting

Page 101: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

101

Power of a Voter: Allocating/Sharing Costs and Revenues

Multicasting

• Multicast routing: Use a directed tree connecting source to all receivers.

• At branch points, a packet is duplicated as necessary.

• Bandwidth is not directly attributable to a single receiver.

• How to distribute costs among receivers?• One idea: Use Shapley value.

Page 102: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

102

Allocating/Sharing Costs and Revenues• Feigenbaum, Papadimitriou, Shenker (2001):

no feasible implementation for Shapley value in multicasting.

• Note: Shapley value is uniquely characterized by four simple axioms.

• Sometimes we state axioms as general principles we want a solution concept to have.

• Jain and Vazirani (1998): polynomial time computable cost-sharing algorithmSatisfying some important axiomsCalculating cost of optimum multicast tree within

factor of two of optimal.

Page 103: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

103

CS and DM: Outline 1. Consensus Rankings2. Meta-search and Collaborative Filtering3. Computational Approaches to Information

Management in Group Decision Making4. Large Databases and Inference5. Consensus Computing, Image Processing6. Computational Intractability of Consensus

Functions7. Electronic Voting8. Software and Hardware Measurement9. Power of a Voter10.Sequential DM: Port of Entry Inspections

Page 104: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

104

Algorithms for Port of Entry Inspection for WMDs

Joint work with Saket Anand, David Madigan, Richard Mammone, Saumitr Pathak, Philip Stroud

Page 105: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

105

Port of Entry Inspection Algorithms

•Goal: Find ways to intercept illicit nuclear materials and weapons

destined for the U.S. via the maritime transportation system

•Currently inspecting only small % of containers arriving at ports

•Even inspecting 8% of containers in Port of NY/NJ might bring international trade to a halt (Larrabbee 2002)

Page 106: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

106

Port of Entry Inspection Algorithms•Aim: Develop decision support algorithms that will help us to “optimally” intercept illicit materials and weapons subject to limits on delays, manpower, and equipment

•Find inspection schemes that minimize total “cost” including “cost” of false positives and false negatives

Mobile Vacis: truck-mounted gamma ray imaging system

Page 107: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

107

Sequential Decision Making Problem•Stream of containers arrives at a port•The Decision Maker’s Problem:

•Which to inspect?•Which inspections next based on previous results?

•Approach: –“decision logics”–combinatorial optimization methods–Builds on ideas of Stroud and Saeger at Los AlamosNational Laboratory–Need for new models and methods

Page 108: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

108

Sequential Diagnosis Problem•Such sequential diagnosis problems arise in many areas:

–Communication networks (testing connectivity, paging cellular customers, sequencing tasks, …)–Manufacturing (testing machines, fault diagnosis, routing customer service calls, …)–Artificial intelligence/CS (optimal derivation strategies in knowledge bases, best-value satisficing search, coding decision trees, …)–Medicine (diagnosing patients, sequencing treatments, …)

Page 109: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

109

Sequential Decision Making Problem•Containers arriving to be classified into categories.•Simple case: 0 = “ok”, 1 = “suspicious”

•Inspection scheme: specifies which inspections are to be made based on previous observations

Page 110: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

110

Sequential Decision Making Problem

•Containers have attributes, each in a number of states

•Sample attributes:–Levels of certain kinds of chemicals or biological materials–Whether or not there are items of a certain kind in the cargo list–Whether cargo was picked up in a certain port

Page 111: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

111

Sequential Decision Making Problem

•Currently used attributes:–Does ship’s manifest set off an “alarm”?–What is the neutron or Gamma emission count? Is it above threshold?–Does a radiograph image come up positive?–Does an induced fission test come up positive?

Gamma ray detector

Page 112: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

112

Sequential Decision Making Problem

•We can imagine many other attributes•Concern with general algorithmic approaches.•Seek a methodology not tied to today’s technology.•Detectors are evolving quickly.

Page 113: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

113

Sequential Decision Making Problem

•Simplest Case: Attributes are in state 0 or 1

•Then: Container is a binary string like 011001

•So: Classification is a decision function F that assigns each binary string to a category.

011001 F(011001)

If attributes 2, 3, and 6 are present and others are not, assign container to category F(011001).

Page 114: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

114

Sequential Decision Making Problem

•If there are two categories, 0 and 1, decision function F is a boolean function.

Example: F(000) = F(111) = 1, F(abc) = 0 otherwise

This classifies a container as positive iff it has none of the attributes or all of them.

1 =

Page 115: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

115

Sequential Decision Making Problem

•Given a container, test its attributes until know enough to calculate the value of F.

•An inspection scheme tells us in which order to test the attributes to minimize cost.

•Even this simplified problem is hard computationally.

Page 116: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

116

Sequential Decision Making Problem•This assumes F is known.•Simplifying assumption: Attributes are independent.•At any point we can stop inspecting and output the value of F based on outcomes of inspections so far.•Complications: May be precedence relations in the components (e.g., can’t test attribute a4 before testing a6. •Or: cost may depend on attributes tested before.•F may depend on variables that cannot be directly tested or for which tests are too costly.

Page 117: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

117

Sequential Decision Making Problem

•Such problems are hard computationally.•There are many possible boolean functions F.•Even if F is fixed, problem of finding a good classification scheme (to be defined precisely below) is NP-complete. •Several classes of functions F allow for efficient inspection schemes:

–k-out-of-n systems–Certain series-parallel systems–Read-once systems–“regular” systems–Horn systems

Page 118: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

118

Sensors and Inspection Lanes•n types of sensors measure presence or absence of the n attributes. •Many copies of each sensor.•Complication: different characteristics of sensors.•Entities come for inspection.•Which sensor of a given type to use?•Think of inspection lanes and queues.•Besides efficient inspection schemes, could decrease costs by:

–Buying more sensors–Change allocation of containers to sensor lanes.

Page 119: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

119

Binary Decision Tree Approach•Sensors measure presence/absence of attributes.

•Binary Decision Tree (BDT): –Nodes are sensors or categories (0 or 1)–Two arcs exit from each sensor node, labeled left and right.–Take the right arc when sensor says the attribute is present, left arc otherwise

Page 120: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

120

Binary Decision Tree Approach

•Reach category 1 from the root only through the path a0 to a1 to 1.

•Container is classified in category 1 iff it has both attributes a0 and a1 .

•Corresponding boolean function F(11) = 1, F(10) = F(01) = F(00) = 0.

Figure 1

Page 121: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

121

Binary Decision Tree Approach•Reach category 1 from the root by:a0 L to a1 R a2 R 1 ora0 R a2 R1

•Container classified in category 1 iff it hasa1 and a2 and not a0 or a0 and a2 and possibly a1 .

•Corresponding boolean function F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

Figure 2

Page 122: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

122

Binary Decision Tree Approach•This binary decision tree corresponds to the same boolean function

F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

However, it has one less observation node ai. So, it is more efficient if all observations are equally costly and equally likely.

Figure 3

Page 123: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

123

Binary Decision Tree Approach•Even if the boolean function F is fixed, the problem of finding the “optimal” binary decision tree for it is very hard (NP-complete).

•For small n = number of attributes, can try to solve it by brute force enumeration.

•Even for n = 4, not practical. •(n = 4 at Port of Long Beach-Los Angeles)

Port of Long Beach

Page 124: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

124

Binary Decision Tree ApproachPromising Approaches:

•Heuristic algorithms, approximations to optimal.•Special assumptions about the boolean function F. •For “monotone” boolean functions, integer programming formulations give promising heuristics.•Stroud and Saeger enumerate all “complete,” monotone boolean functions and calculate the least expensive corresponding binary decision trees.•Their method practical for n up to 4, not n = 5.

Page 125: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

125

Binary Decision Tree ApproachMonotone Boolean Functions:

•Given two strings x1x2…xn, y1y2…yn

•Suppose that xi yi for all i implies that F(x1x2…xn) F(y1,y2…yn).•Then we say that F is monotone. •Then 11…1 has highest probability of being in category 1.

Page 126: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

126

Binary Decision Tree Approach

Incomplete Boolean Functions:

•Boolean function F is incomplete if F can be calculated by finding at most n-1 attributes and knowing the value of the input string on those attributes•Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) = F(001) = F(010) = F(011) = 0. •F(abc) is determined without knowing b (or c).•F is incomplete.

Page 127: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

127

Binary Decision Tree ApproachComplete, Monotone Boolean Functions:

•Stroud and Saeger: “brute force” algorithm for enumerating binary decision trees implementing complete, monotone boolean functions and choosing least cost BDT. •Feasible to implement up to n = 4.•n = 2:

–There are 6 monotone boolean functions.–Only 2 of them are complete, monotone–There are 4 binary decision trees for calculating these 2 complete, monotone boolean functions.

Page 128: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

128

Binary Decision Tree Approach

Complete, Monotone Boolean Functions:

•n = 3:–9 complete, monotone boolean functions.–60 distinct binary trees for calculating them

•n = 4:–114 complete, monotone boolean functions.–11,808 distinct binary decision trees for calculating them.–(Compare 1,079,779,602 BDTs for all boolean functions)

Page 129: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

129

Binary Decision Tree Approach

Complete, Monotone Boolean Functions:

•n = 5:–6894 complete, monotone boolean functions–263,515,920 corresponding binary decision trees.

•Combinatorial explosion! •Need alternative approaches; enumeration not feasible!•(Even worse: compare 5 x 1018 BDTs corresponding to all boolean functions)

Page 130: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

130

Cost Functions

•Stroud-Saeger method applies to more sophisticated cost models, not just cost = number of sensors in the BDT.•Using a sensor has a cost:

–Unit cost of inspecting one item with it–Fixed cost of purchasing and deploying it–Delay cost from queuing up at the sensor station

•Preliminary problem: disregard fixed and delay costs. Minimize unit costs.

Page 131: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

131

Cost Functions: Delay Costs•Tradeoff between fixed costs and delay costs: Add more sensors cuts down on delays.•Stochastic process of containers arriving•Distribution of delay times for inspections•Use queuing theory to find average delay times under different models

Page 132: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

132

Cost Functions:Unit Costs

Tree Utilization•Complication: Assume cost depends on how many nodes of BDT are actually visited during an “average” container’s inspection. (This is sum of unit costs.)•Depends on characteristics of population of entities being inspected. •I.e., depends on “distribution” of containers. •In our early models, we assume we are given probability of sensor errors and probability of bomb in a container.•This allows us to calculate “expected” cost of utilization of the tree Cutil.

Page 133: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

133

Cost Functions

•Cost of false positive: Cost of additional tests.

–If it means opening the container, it’s very expensive.

•Cost of false negative: –Complex issue.–What is cost of a bomb going off in Manhattan?

Page 134: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

134

Cost Functions: Sensor Errors•One Approach to False Positives/Negatives:Assume there can be Sensor Errors•Simplest model: assume that all sensors checking for attribute ai have same fixed probability of saying ai is 0 if in fact it is 1, and similarly saying it is 1 if in fact it is 0.•More sophisticated analysis later describes a model for determining probabilities of sensor errors. •Notation: X = state of nature (bomb or no bomb)

Y = outcome (of sensor or entire inspection process).

Page 135: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

135

Probability of Error for The Entire TreeState of nature is zero (X = 0), absence of a bomb State of nature is one (X = 1), presence

of a bomb

Probability of false positive (P(Y=1|X=0))

for this tree is given by

Probability of false negative(P(Y=0|X=1))

for this tree is given by

A

B

C

0

1

0 1

A

B

C

0

1

0 1

P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0) + P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)

Pfalsepositive

P(Y=0|X=1) = P(YA=0|X=1) + P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)

Pfalsenegative

Page 136: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

136

Cost Function used for Evaluating the Decision Trees.

CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative + Cutil

CFalsePositive is the cost of false positive (Type I error)CFalseNegative is the cost of false negative (Type II error)PFalsePositive is the probability of a false positive occurringPFalseNegative is the probability of a false negative occurringCutil is the expected cost of utilization of the tree.

Page 137: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

137

Stroud Saeger Experiments• Stroud-Saeger ranked all trees formedfrom 3 or 4 sensors A, B, C and D according to increasing tree costs. • Used cost function defined above. • Values used in their experiments:

– CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10;– CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01;– CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001;– CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05;

– Here, Ci = unit cost of utilization of sensor i. • Also fixed were: CFalseNegative, CFalsePositive, P(X=1)

Page 138: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

138

Stroud Saeger Experiments: Our Sensitivity Analysis

• We have explored sensitivity of the Stroud-Saeger conclusions to variations in values of these three parameters.

• We estimated high and low values for these parameters.n = 3 (use sensors A, B, C)

• We chose one of the values from the interval of values and then explored the highest ranked tree as the other two were chosen at random in the interval of values. 10,000 experiments for each pair of fixed values.

• We looked for the variation in the top-ranked tree and how the top-rank related to choice of parameter values.

• Very surprising results.

Page 139: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

139

Conclusions from Sensitivity Analysis

• Considerable lack of sensitivity to modification in parameters for trees using 3 sensors.

• Very few optimal trees.

• Very few boolean functions arise among optimal and near-optimal trees.

• Similar results for trees using 4 sensors.

Page 140: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

140

Stroud Saeger Experiments: Our Sensitivity Analysis

– CFalseNegative was varied between 25 million and 10 billion dollars• Low and high estimates of direct and indirect costs

incurred due to a false negative.

– CFalsePositive was varied between $180 and $720

• Cost incurred due to false positive

(4 men * (3 -6 hrs) * (15 – 30 $/hr)– P(X=1) was varied between 1/10,000,000 and

1/100,000

Page 141: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

141

Frequency of Top-ranked Trees when CFalseNegative and CFalsePositive are Varied

• 10,000 randomized experiments (randomly selected values of CFalseNegative and CFalsePositive from the specified range of values) for the median value of P(X=1).

• The above graph has frequency counts of the number of experiments when a particular tree was ranked first or second, or third and so on.

• Only three trees (7, 55 and 1) ever came first. 6 trees came second, 10 came third, 13 came fourth.

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

Tree no.

Fre

qu

en

cy

1st2nd3rd4th5th

Page 142: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

142

• 10,000 randomized experiments for the median value of CFalsePositive.

• Only 2 trees (7 and 55) ever came first. 4 trees came second. 7 trees came third. 10 and 13 trees came 4th and 5th respectively.

Frequency of Top-ranked Trees when CFalseNegative and P(X=1) are Varied

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

8000

Tree no.

Fre

qu

en

cy

1st2nd3rd4th5th

Page 143: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

143

• 10,000 randomized experiments for the median value of CFalseNegative. • Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10

trees came third. 13 and 16 trees came 4th and 5th respectively.

Frequency of Top-ranked Trees when P(X=1) and CFalsePositive are Varied

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

Fre

qu

en

cy

Tree no.

1st2nd3rd4th5th

Page 144: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

144

Values of CFalseNegative and CFalsePositive when Tree 7 was Ranked First

• This is a graph of CFalsePositive against CFalseNegative values obtained from the randomized experiments. The black dots represent points at which tree 7 scored first rank.

Page 145: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

145

Values of CFalseNegative and CFalsePositive when Tree 55 was Ranked First

• Tree 55 fills up the lower area in the range of CFalseNegative and CFalsePositive values.

Page 146: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

146

Values of CFalseNegative and CFalsePositive when Tree 1 was Ranked First

• Tree 1 fills up the upper area in the range of CFalseNegative and CFalsePositive.

0 1 2 3 4 5 6 7 8 9 10

x 109

0

100

200

300

400

500

600

700

800

900

CFalseNegative

CFalsePositive

Page 147: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

147

Values of CFalseNegative and CFalsePositive for the Three First Ranked Trees

• Trees 7, 55 and 1 fill up the entire area in the range of CFalseNegative and CFalsePositive among themselves.

Page 148: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

148

Values of CFalseNegative and P(X=1) when Tree 7 was Ranked First

• Tree 7 again fills up the major area in the range of CFalseNegative and P(X=1).

Page 149: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

149

Values of CFalseNegative and P(X=1) when Tree 55 was Ranked First

• Tree 55 fills up the rest of the area in the range of CFalseNegative and P(X=1).

Page 150: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

150

Values of CFalseNegative and P(X=1) for First Ranked Trees

• Together trees 7 and 55 fill up the entire region of CFalseNegative and P(X=1).

Page 151: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

151

Values of CFalsePositive and P(X=1) When Tree 7 was Ranked First

• Tree 7 fills up the major area in the range of CFalsePositive and P(X=1).

Page 152: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

152

Values of CFalsePositive and P(X=1) when Tree 55 was Ranked First

• Tree 55 fills up the upper area in the range of CFalsePositive and P(X=1).

Page 153: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

153

Values of CFalsePositive and P(X=1) when Tree 1 was Ranked First

• Tree 1 fills up the lower area in the range of CFalsePositive and P(X=1).

0 100 200 300 400 500 600 700 800 9000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1x 10

-5

CFalsePositive

P(X=1)

Page 154: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

154

Values of CFalsePositive and P(X=1) for First Ranked Trees

• Trees 7, 55 and 1 fill up the entire area in the range of CFalsePositive and P(X=1) among themselves.

Page 155: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

155

Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors

• Second set of computer experiments: n = 4

(use sensors, A, B, C, D).

• Same values as before.

• Experiment 1: Fix values of two of CFalseNegative,

CFalsePositive, P(X=1) and vary the third through their interval of possible values.

• Experiment 2: Fix a value of one of CFalseNegative,

CFalsePositive, P(X=1) and vary the other two.

• Do 10,000 experiments each time.

• Look for the variation in the highest ranked tree.

Page 156: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

156

Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors

• Experiment 1: Fix values of two of CFalseNegative, CFalsePositive, P(X=1) and vary the third.

Page 157: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

157

CTot vs CFalseNegative for Ranked 1 Trees (Trees 11485(9651) and 10129(349))

Only two trees ever were ranked first, and one, tree 11485, was ranked first in 9651 out of 10,000 runs.

Page 158: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

158

CTot vs CFalsePositive for Ranked 1 Trees (Tree no. 11485 (10000))

One tree, number 11485, was ranked first every time.

Page 159: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

159

CTot vs P(X=1) for Ranked 1 Trees (Tree no. 11485(8372), 10129(488), 11521(1056))

Three trees dominated first place. Trees 10201(60), 10225(17) and 10153(7) also achieved first rank but with relatively low frequency.

Page 160: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

160

Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors

• Experiment 2: Fix the values of one of CFalseNegative, CFalsePositive, P(X=1) and vary the others.

Page 161: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

161

Frequency of First Ranked Trees when Two Parameters (CFalseNegative and CFalsePositive) were Varied Keeping P(X=1)

Constant at Randomly Selected Values.

0 2000 4000 6000 8000 10000 120000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

5 Trees coming first -9541 10129 10153 10201 11485 11521

Tree number

Fre

quency

10,000 randomized experiments with randomly selected values of P(X=1) The experiments were repeated for 20 different randomly selected values of P(X=1)

Page 162: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

162

Frequency of First Ranked Trees when Two Parameters (CFalseNegative and P(X=1)) were Varied Keeping CFalsePositive

Constant at Randomly Selected Values.

0 2000 4000 6000 8000 10000 120000

2

4

6

8

10

12

14x 10

4

Tree number

Fre

quency

Trees coming first -505 4695 5105 5129 7353 9541 10129 10153 10201 10225 11485 11521

10,000 randomized experiments with randomly selected values of CFalsePositive

The experiments were repeated for 20 different randomly selected values of CFalsePositive

Page 163: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

163

Frequency of First Ranked Trees when Two Parameters (P(X=1) and CFalsePositive) were Varied Keeping CFalseNegative

Constant at Randomly Selected Values.

0.95 1 1.05 1.1 1.15 1.2

x 104

0

5

10

15x 10

4

Tree number

Fre

quency

Trees coming first -9541 10129 10153 10201 10225 11485 11521

10,000 randomized experiments with randomly selected values of CFalseNegative

The experiments were repeated for 20 different randomly selected values of CFalseNegative

Page 164: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

164

Modeling Sensor Errors•One Approach to Sensor Errors: Modeling Sensor Operation

•Threshold Model:–Sensors have different discriminating power–Many use counts (e.g., Gamma radiation counts)–See if count exceeds threshold–If so, say attribute is present.

Page 165: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

165

Modeling Sensor ErrorsThreshold Model:

•Sensor i has discriminating power Ki, threshold Ti

•Attribute present if counts exceed Ti

•Calculate fraction of objects in each category whose readings exceed T•Seek threshold values that minimize all costs: inspection, false positive/negative•Assume readings of category 0 containers follow a Gaussian distribution and similarly category 1 containers•Simulation approach

Page 166: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

166

Probability of Error for Individual Sensors

• For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2 (P(Yi=0|X=1)) errors are modeled using Gaussian distributions. – State of nature X=0 represents absence of a bomb.– State of nature X=1 represents presence of a bomb. i represents the outcome (count) of sensor i. – Σi is variance of the distributions– PD = prob. of detection, PF = prob. of false pos.

Ki

P(i|X=1)P(i|X=0)

Ti

Characteristics of a typical sensori

Page 167: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

167

Modeling Sensor ErrorsThe probability of false positive for the ith sensor is computed as:

P(Yi=1|X=0) = 0.5 erfc[Ti/√2]The probability of detection for the ith sensor is computed as:

P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]

erfc = complementary error function erfc(x) = (1/2,x2)/sqrt()

The following experiments have been done using sensors A, B, C and using:

KA = 4.37; ΣA = 1KB = 2.9; ΣB = 1KC = 4.6; ΣC = 1

We then varied the individual sensor thresholds TA, TB and TC from -4.0 to +4.0 in steps of 0.4. These values were chosen since they gave us an “ROC curve” for the individual sensors over a complete range P(Yi=1|X=0) and P(Yi=1|X=1)

Page 168: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

168

Frequency of First Ranked Trees for Variations in Sensor Thresholds

• 68,921 experiments were conducted, as each Ti was varied through its entire range. • The above graph has frequency counts of the number of experiments when a particular

tree was ranked first. There are 15 such trees. Tree 37 had the highest frequency of attaining rank one.

0 10 20 30 40 50 600

2000

4000

6000

8000

10000

12000

14000

16000

18000

Tree no.

Fre

qu

en

cy

Page 169: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

169

Conclusions from Sensitivity Analysis: Recapitulation

• Considerable lack of sensitivity to modification in parameters for trees using 3 or 4 sensors.

• Very few optimal trees.

• Very few boolean functions arise among optimal and near-optimal trees.

Page 170: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

170

Some Complications•More complicated cost models; bringing in costs of delays•More than two values of an attribute

(present, absent, present with probability > 75%, absent with probability at least 75%) (ok, not ok, ok with probability > 99%, ok with probability between 95% and 99%)

•Inferring the boolean function from observations (partially defined boolean functions)

Page 171: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

171

Some Research Challenges•Explain why conclusions are so insensitive to variation in parameter values.•Explore the structure of the optimal trees and compare the different optimal trees.•Develop less brute force methods for finding optimal trees that might work if there are more than 4 attributes.•Develop methods for approximating the optimal tree.

Page 172: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

172

Port of Entry Inspection: Closing Remark

•Recall that the “cost” of inspection includes the cost of failure, including failure to foil a terrorist plot.•There are many ways to lower the total “cost” of inspection:

Use more efficient orders of inspection.Find ways to inspect more containers.Find ways to cut down on delays at inspection lanes.

Page 173: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

173

Concluding Comment• In recent years, interplay between CS and biology has transformed major parts of Bio into an information science.• Led to major scientific breakthroughs in

biology such as sequencing of human genome.

• Led to significant new developments in CS, such as database search.

• The interplay between CS and SS-DM not nearly as far along.

• Moreover: problems are spread over many disciplines.

Page 174: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

174

Concluding Comment

• However, CS/SS-DM interplay has already developed a unique momentum of its own.

• One can expect many more exciting outcomes as partnerships between computer scientists and social scientists/decision theorists expand and mature.

Page 175: 1 Computer Science and Decision Making Fred Roberts, Rutgers University

175