honours project report - carleton university...honours project report comp 4905 select topics from...

27
Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry Millionaire’s Problem Privacy Preserving Convex Hull Protocol Private Computation to determine Nearest Pair of Points Submitted by: Simardeep Singh Ahuja Student Number 100726222 Bachelor of Computer Science (Honours) Carleton University Submitted to: Dr. Anil Maheshwari Professor School of Computer Science Carleton University

Upload: others

Post on 09-Oct-2020

21 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Honours Project Report

COMP 4905

Select topics from Private Computation and Private Computational

Geometry

Millionaire’s Problem

Privacy Preserving Convex Hull Protocol

Private Computation to determine Nearest Pair of Points

Submitted by: Simardeep Singh Ahuja Student Number 100726222 Bachelor of Computer Science (Honours) Carleton University

Submitted to: Dr. Anil Maheshwari Professor School of Computer Science Carleton University

Page 2: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Acknowledgement

I would like to thank Dr. Anil Maheshwari for providing me the opportunity to work on such interesting

topics and for supervising the progress of the project. I would like to thank my sister Rasneek Ahuja with

who I discussed and validated most of my ideas. She contributed several key ideas and points. She also

suggested that I buy a compass. I wish I had listened to her earlier. I would also like to thank my friend

Eric Lawless for a discussion on the lower bound on the shortest distance in the Nearest Pair Problem.

Last but not the least, I am fascinated by the intellect of Dr. Yao and Sandeep Hans et al. The research

papers were a pleasure to study. I thank them for their brilliant work.

Page 3: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Table of Contents 1.0 Abstract ................................................................................................................................................. 5

1.1 The Millionaire’s Problem ....................................................................................................................... 5

1.2 Private Computation of Joint Convex Hull .............................................................................................. 5

1.3 Private Computation of Nearest Pair of Points ....................................................................................... 5

2.0 Problem Definitions and Motivation ..................................................................................................... 6

2.1 The Millionaire’s Problem ....................................................................................................................... 6

Problem Definition ................................................................................................................................ 6

Results from Research Paper ................................................................................................................ 6

Motivation ............................................................................................................................................. 6

2.2 Private Computation of Joint Convex Hull ............................................................................................ 6

Problem Definition ................................................................................................................................ 6

Results from Research Paper ................................................................................................................ 6

Motivation ............................................................................................................................................. 6

2.3 Private Computation of Nearest Pair of Points ....................................................................................... 6

Problem Definition ................................................................................................................................ 6

Results ................................................................................................................................................... 7

Motivation ............................................................................................................................................. 7

3.0 Terminology ............................................................................................................................................ 7

Spreading ...................................................................................................................................................... 7

Spreading .............................................................................................................................................. 7

Other terminology ........................................................................................................................................ 7

Terminology for Section 6.0 Onwards .................................................................................................. 8

4.0 The Millionaire’s Problem – Yao’s protocol ............................................................................................ 8

4.1 My Comments on Yao’s Protocol ............................................................................................................ 9

4.1.1 An analogy to explain the protocol in simple terms ............................................................................ 9

4.1.2 Can the protocol work without step 4)? .............................................................................................. 9

4.1.3 Why spreading is important and what if we don’t have it?................................................................. 9

4.1.4 How much spreading is enough spreading? ...................................................................................... 10

The above discussion raises the question:.......................................................................................... 10

4.1.5 Is the technique presented in step 4 a good method to create spreading?...................................... 10

4.1.6 My modifications to Yao’s Protocol ................................................................................................... 11

Page 4: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

4.1.6.1 Add random numbers instead of 1’s....................................................................................... 11

4.1.6.2 Insert random number to push out the jth number ................................................................ 12

5.0 Privacy Preserving Convex Hull Protocol .............................................................................................. 13

5.1 My Comments on Privacy Preserving Convex Hull ................................................................................ 15

5.1.1 The beauty of choosing yj+1 as tangential point if yj, yj+1, xc are collinear ........................................... 15

For cases 1 and 2: ................................................................................................................................ 15

Case 1: ................................................................................................................................................. 15

Case 2: ................................................................................................................................................. 16

5.1.2 Minor technical flaw in the algorithm ............................................................................................... 16

5.1.3 The existence of a O(log |P|) algorithm for finding tangents ........................................................... 16

5.1.4.1 Problems with usage of Yao’s Protocol .......................................................................................... 16

Comparison of distances: .................................................................................................................... 16

Comparison of cosines of angles: ....................................................................................................... 17

5.1.4.2 How to alleviate problems discussed in 5.1.4.1.............................................................................. 17

Comparison of cosines of angles: ....................................................................................................... 17

Comparison of distances: .................................................................................................................... 17

Cost of above modifications ............................................................................................................... 17

6.0 A Heuristic approach to determine the nearest pair of points ............................................................. 18

6.1 A Brief Explanation ................................................................................................................................ 19

6.2 The iterative part and associated problem ........................................................................................... 19

Termination ......................................................................................................................................... 19

6.3 Bounding Circle ..................................................................................................................................... 19

6.4 How to obtain a lower bound on the actual distance between optimal solution ................................ 20

Some technical points about Yao’s protocol are: ....................................................................................... 23

Log|P| algorithm for finding tangents ........................................................................................................ 24

FIND-TANGENTS(xc, Py) ....................................................................................................................... 24

(Line, point) GO-OUTWARD (L, LIST-OF-POINTS, DIRECTION, xc) ....................................................... 25

Steps to use the application ........................................................................................................................ 26

Other features ............................................................................................................................................. 26

Steps to build the Application again from source code: ............................................................................. 27

Page 5: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

1.0 Abstract This focus of this paper is the study of the following problems:

1.1 The Millionaire’s Problem The discussion will be based on the protocol presented by Dr. Andrew C Yao in his research paper Protocols for Secure Computation. In his research paper, mathematical operations are used without detailed explanation of the purpose they serve. First: I present some real world analogies that fill that gap. The analogies can be used by someone unfamiliar with Mathematics and Algorithms (Section 4.1.1) to understand the most important concepts in the protocol. Second: the fundamental idea at the heart of Yao’s protocol is to hide information about variables i or j in the indices/ positions of numbers in a sequence. While this idea is extremely elegant, there are certain technical errors in the paper which will be discussed and solutions proposed to remedy them if possible.

1.2 Private Computation of Joint Convex Hull The discussion is based on the algorithm Privacy Preserving Convex Hull Protocol presented in the research paper On Privacy Preserving Convex Hull by Sandeep Hans et al. The algorithm presented in the aforementioned research paper is very elegant and some of the subtle points will be presented in this paper. This algorithm uses Yao’s Protocol as a service. However, there are some lacunae in the usage of Yao’s protocol. Suggestions on usage of Yao’s protocol are presented in section 5.1.4.1. Some minor technical errors are also mentioned.

1.3 Private Computation of Nearest Pair of Points Two parties have a set of points each and wish to determine among all pairs of points which consist of one point from either party’s set, the one with minimum separation between the two points. Ideally, this would be done in a way that neither party knows any of the points in the other party’s set except the one that is a part of the nearest pair. At the time of writing this paper, this is an unsolved problem. A new heuristic approach is presented that strives to minimize the need to share points, while trying to generate a pair of points with separation close to the separation of the nearest pair. The approach presented only work if the points are disjoint i.e. a (any) line can be drawn such that one party’s points are on one side of the line and the other party’s points are on the other side.

Page 6: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

2.0 Problem Definitions and Motivation The biggest motivation for problems in Private Computational Geometry is the swathe of potential practical applications they can have. Below are the three problems briefly defined and a few words about why one could be motivated to study any of these.

2.1 The Millionaire’s Problem

Problem Definition

Two people (or m in the general case) possess variables i and j respectively where i and j are within a finite range of integers. They wish to ascertain whether i<j without revealing the values of i and j to each other.

Results from Research Paper

The protocol presented by Dr. Yao solves the problem, although some modifications presented in section 4.1.6 will be needed to make the protocol fully functional. In addition a concept called the ϵ, δ Privacy Constraint is presented which provides a measure of how secure a private computation protocol is.

Motivation

The Millionaire’s Problem provides a method to compare numbers without revealing their values. This has very wide potential applications in private computation of functions - specifically when a comparison of two numbers (one belonging to each party) within a fixed range is required such that the parties don’t know what the other party’s number is. Private Computation of Joint Convex Hull is one such example of application of Yao’s Protocol.

2.2 Private Computation of Joint Convex Hull

Problem Definition

Two parties (or m in the general case) have a set of points each. They wish to determine the convex hull for combined set of both players’ points. They want to do so in a private manner, revealing only the points that will be on the final convex hull.

Results from Research Paper

The research paper presents a very elegant approach to solve the problem presented above. It is accompanied by a proof of correctness and complexity analysis.

Motivation

Private Computation of the Join Convex Hull, besides being interesting could have very practical uses. Imagine two security forces that want to erect a fence around their combined territory to guard against outside threats. While doing so they may not want to share more information that necessary about where each one of them has check-posts/ look-outs etc. This algorithm will be just right for that job.

2.3 Private Computation of Nearest Pair of Points

Problem Definition

One party PA owns a set of points A. Another party PB owns a set of points B. Out of all the possible pairs (p,q) such that p∈A and q∈B, let (p’, q’) be the one such that distance between p’ and q’ is less than the

Page 7: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

distance between any other p and q. The ideal solution would be to determine p’ and q’ such that p’ and q’ are the only points shared in the process – PA should not have to share any other p∈A with PB, similarly PB should not have to share any other q∈B with PA.

Results

I have created a heuristic approach that tries to finding a pair of points such that their separation is as close as possible to that of the optimal solution (the pair of points that is truly nearest) while trying to minimize the number of points that need to be shared. It does not guarantee the nearest pair will actually be found – the algorithm relies on an elimination technique that is used iteratively to rule out points. It is possible that at any iteration no points are eliminated, so we never get one or both of the points that actually constitute the nearest pair. However, I present some ideas on how the parties PA and PB can, at the end of any iteration, calculate a lower bound on the distance between the nearest pair and see how close they are in terms of the distance between the tentative nearest pair. I have also implemented a working example of the algorithm in C# and instructions for using the program can be found in Appendix C.

Motivation

Imagine there is a company that manufactures bombs for the army. There is another company that manufactures missile warheads. Suppose the Bomb company needs to send a shipment to the Warhead company (or vice versa). Both companies have the need to maintain as much secrecy as possible about their locations. Still, they need to find out the pair of locations, one A’s and another B’s, such that the distance between the locations is the minimum. This could be to save on transportation using a air-transport (or approximate a road route) or to minimize exposure to ambush.

3.0 Terminology

Spreading For the sake of convenience I define a concept here that I refer to during the discussion of Yao’s protocol. Spreading

1. An operation that ensures that a list of numbers differ by at least 2 modulo p for a given p. 2. Above definition with a number greater than two, instead of 2.

Enough/ Good/ Spreading: 3. A list of numbers after an operation as in 1. or 2. has been applied.

It will be clear from the context which meaning is implied.

Other terminology For sections 4.x I follow the terminology used in the research paper by Andrew C Yao. For sections 5.x I have follow the terminology used in the research paper by Sandeep Hans et al. While I make an attempt to include necessary symbols and meanings, a familiarity with the terminology original research papers will be very helpful in understanding the contents of this paper.

Page 8: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Terminology for Section 6.0 Onwards

|S| represents the cardinality of the set S. |ab| represents the distance between two points a and b.

Some other terms are introduced as and when needed in relevant sections.

4.0 The Millionaire’s Problem – Yao’s protocol

COMMENTS: It is not explicitly mentioned in the text of the original paper that the N bit number chosen is called x, but that is the only conclusion that seems to make sense here. The only purpose the +1 part seems to serve is that Alice does not have to add 1. Sending k-j would also be equivalent. Bob is just being nice Why p has to be of N/2 bits is still not clear to me, but for now it will not matter as this step will not work anyway (discussed in a later section). This has a typo “Zi +1” should be omitted – if it wasn’t a typo it would give Bob a very high chance of guessing i.

All references to a numbered step in any section numbered 4.x refer to one of the above steps.

Page 9: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

4.1 My Comments on Yao’s Protocol

4.1.1 An analogy to explain the protocol in simple terms There is a room with 10 coin boxes. The boxes have a lock and key mechanism and also a slit on top through which one can drop coins inside. Bob walks in with his own coin box and replaces the jth coin box with his own identical looking coin box. Since this is his coin box he will be able to check its contents when he wants to. Now Bob leaves and Alice walks in. She puts a coin in each box after the ith box. Now Alice leaves, Bob walks in again. Bob uses his key and opens his coin box at the jth location, if he sees a coin inside then i<j; if his coin box is empty then i≥j.

4.1.2 Can the protocol work without step 4)? Yes. Spreading is used to keep Bob from finding out i (as will be discussed in 4.1.3). Without the spreading requirement Bob may be able to cheat, but the functionality is not compromised. That is, if Bob is a very sincere person and chooses to only inspect the jth element only, he can simply check if the jth element equals x or not to decide whether i>j or not. It may also be the case that the list of Yu’s already has enough spreading so that the first p chosen works. Again, Bob can use the same test 1 (in this case there is no risk of cheating by Bob) Example Suppose N is chosen to be 10. Bob has j=5, Alice has i=3. Bob chooses x = 13 and calculates k = Ea(13) = 105. He sends k-j+1 = 105-5+1=101 to Alice. Alice computes Da(101)…Da(110). These Yu’s come out to be: 1, 4, 7, 10, 13, 16, 19, 21, 24, 27. If Alice just accepts Yu’s as Zu’s and sends over Z1, Z2, Z3, Z4+1.. Z10+1 = {1, 4, 7, 11, 14, 17, 20, 22, 25, 28}. Bob can just see that the jth number is 14≠13 so he knows that i<j. Also, there doesn’t seem to be any way Bob can figure out where Alice started adding the 1s.

4.1.3 Why spreading is important and what if we don’t have it? Consider getting a set of Yu’s without enough spreading and either just accepting that list as Zu’s or accepting a p that does not produce a satisfactory spreading. Either way Alice will end up with a list of numbers with bad spreading for performing step 5. For the sake of simplicity, consider the first case: she just takes Zu=Yu for all u and p is not computed. Alice can simply send Z1, ..Zi, Zi+1+1, …Z10+1 to Bob - she doesn’t have a p to send. Example Let’s assume i = 3 and Zu’s = Yu’s are ,33, 34, …42-.Then Alice would send z1, z2, z3, z4+1, z5+1, z6+1, z7+1, z8+1, z9+1, z10+1. This list would be 33, 34, 35, 37, 38, 39, 40, 41, 42, 43. Just by looking at the list Bob would be able to conclude that starting with 37 every number has an extra 1 and hence i=3 (the index of 35).2 Spreading is used to hide the position (i) after which a 1 has been added to each Zu by Alice. To put it simply, if the values are too close together then Bob can tell where they start to differ.

1 Equality mod p is simply same as equality when Yu’s equal Zu’s and there is no p – or when the choice of p is trivial

as Yu’s already have a spreading of at least 2 so they are chosen as Zu’s. 2 This is in accordance with Yao’s stipulation that 1<i<10. If we allow 1≤i≤10 then Bob will know that i=1 or 3. This is

still unacceptable.

Page 10: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

4.1.4 How much spreading is enough spreading? The protocol has left a lacuna in terms of forming the condition for acceptance of a p (which directly determines the amount of spreading achieved in Zu’s). Assume i=3 and Zu’s are ,21, 23, 25, 27, 29, 31, 33, 35, 37, 39- and the p chosen is 41. Clearly, p and Zu’s are such that the condition in step 4 is satisfied and Alice can proceed with step 5: As per step 5 she sends: {21, 23, 25, 28, 30, 32, 34, 36, 38, 40} Bob will build logic based on the knowledge of the how p is chosen as follows: Since all numbers must have differed by at least 2 (saying mod 41 is irrelevant here as they are already divided mod 41) and there is a difference of 2 between each successive pair except 25 and 28 where it is more than 2, Alice must have added a 1 to each number starting at 28. Thus he knows i is 3.3 What if the list was Zu’s = ,21, 23, 25, 27, 29, 31, 34, 36, 38, 40}? Alice would send {21, 23, 25, 28, 30, 32, 35, 37, 39, 41}? Bob would notice that there is a gap of more than 2 in (25, 28) and (32, 35) but only a gap of 2 in any other successive pairs. He would know that Alice started adding 1 to every number at either 28 (i=3) or 35 (i=6).4 If j is in [4, 6] then Bob can even figure out whether i is 3 or 6 because if i is 3 his x will not be equal to Zj mod p, otherwise he still has a 50% chance of guessing the correct i.

The above discussion raises the question:

Will a p such that all Zu’s differ by 3 in the mod p sense be adequate? How about differing by 4? Or 5? As soon as we choose any specific number as the minimum difference or minimum measure of spreading required, we can always create an example like the one presented above – all the numbers have only the minimum amount of spreading except for one (or a small number) of pairs and these pairs will stand out. To sum up: this approach is not going to work.

4.1.5 Is the technique presented in step 4 a good method to create spreading? Even if we ignore the discussions in sections 4.1.2 and 4.1.4, this technique seems flawed. Consider the

case of division modulo p = 75 to attempt to create spreading in the Yu’s. The numbers ≥ 7 “move under”

[0, 6], so to speak by doing the operation division mod 7. Like the difference between two numbers on the number line can be represented the horizontal distance between them on the number lines, the difference between two numbers mod 7 can be represented by the horizontal distance between the vertical classes they fall in. 0,1,2,3,4,5,6,7,8,9,10,11,12,14,15,16,17,18,19,20,21,22… By doing mod 7 this becomes:

VC 0 VC 1 VC 2 VC 3 VC 4 VC 5 VC 6

0 1 2 3 4 5 6 HC 0

7 8 9 10 11 12 13 HC 1

14 15 16 17 18 19 20 HC 2

21 22 …

HC = Horizontal class. VC = Vertical class

3 Same as footnote 2.

4 In accordance with footnote 2 and 3, In this case i=1, 3 or 6

5 I took 7 as an example as in the paper a prime number is used in step 4. The argument presented here will

actually apply to any positive integer.

Page 11: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Let’s examine the change in different pairs of numbers (Keeping in mind that cases of good spreading are what we are interested in):

If we take two numbers, both from the same horizontal class, the difference between them remains unchanged – this case is not interesting for our purpose.

If we consider one number from one horizontal class and another from a horizontal class such that they are separated by one or more horizontal classes, the difference between them can only decrease. The reason is: Their difference before doing mod 7 was more than 7 and now it has to be less than 7. This case is also not interesting for our purpose.

If we consider two numbers from consecutive horizontal classes, their difference may increase or decrease depending on which vertical class they fall in (It could also remain unchanged if we were talking about an even number instead of 7). This is a case that can possibly be interesting:

o Spreading as needed by our protocol should be such that if two Yu’s didn’t already differ by at least 2, the corresponding Zu’s should. This implies that we should be interested in cases when two numbers differ by 0 or 1 before doing mod 7 but 2 or more after. The only case is k*7-1 and k*7 – the difference changes from 1 to 6. To generalize for p: the only case when spreading happens by dividing modulo p is when

two Yu’s are of the form k*p-1, and k*p where k ∈ Z+. Hence we can conclude: unless all the Yu’s that differ by less than 2 are cases of this type, division modulo p will still not create spreading. I see no reason why we can expect this to satisfactorily create spreading in an arbitrary list of numbers. This can imply that the protocol can get stuck in the for loop in step 4 until all random N/2 bit primes are

exhausted.

4.1.6 My modifications to Yao’s Protocol In section 4.1.4 I presented arguments as to why step 4 in Yao’s protocol does not create the spreading it as was apparently intended. Hence we need a new way to create spreading (4.1.6.1) or to remove the need for spreading (4.1.7.1) to fix the protocol:

4.1.6.1 Add random numbers instead of 1’s

In the analogy in section 4.1.1, if Alice added any random number of coins of each box after Zi , the (physical) protocol would still work. Bob would check to see if the jth box has any coins (i<j) or none(i≥j). The same can be applied to Yao’s protocol: In step 4, instead of adding a 1 to each number in Zi+1 ...Z10 , Alice should add a different random positive6 number to each number in the sequence. This way the sequence will not be incremented as a group by 1 (as was the case in some examples in sections 4.1.3 and 4.1.4). Bob has no way of guessing anything even if the numbers turn out to be bunched together, because he knows that after a certain position each number is just:

A random number + Da(another unknown number). When Bob receives the list of numbers, he can simply check if the jth number is x(i<j) or not(i≥j). However, there is one restriction on the choice of random number. If a certain Yu+random number exceed 2N-1, then Bob will know the extra bit of information that i<u. An interesting analogy would be to fill a box with so many coins that extra ones spill out. The way to remedy this is:

6 Strictly positive. 0 will not work. If per chance, 0 is added to the number at the j

th position then Bob will end up

with a wrong conclusion.

Page 12: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

For each Yu after Yi generate a random number and compute Wu= (Yu+randomNumber)%(2N). If Wu=Yu then try with another random number until Wu≠Yu. Then Alice sends out the list Y1....Yi,Wi+1….W10.

4.1.6.2 Insert random number to push out the jth number

Consider another modification to the analogy in section 4.1.1: Alice comes in with her own coin box and puts it at the ith location, thereby moving out the ith box to the (i+1)th position, (i+1)th box to the (i+2)th position and so on. Now Bob comes in. He does not check for any coins. He simply checks if he can open the jth box or not. If he can open it, its position has not changed, which further implies that Alice’s box didn’t push his out and i≥j. Otherwise I<j. This can idea can also be ported to Yao’s protocol. Alice just sends out the following list: Y1...Yi , R, Yi+1…Y10. Here R is any random number less than 2N (for the same reason as in section 4.1.6.1 – if R≥2N, it’s a dead giveaway to Bob. He can immediately recognize R and hence i).

This concludes part that, in my opinion, relates to the concepts in Yao’s protocol at a very fundamental

level. Appendix A contains some other fairly important points about the protocol as well.

Page 13: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

5.0 Privacy Preserving Convex Hull Protocol

Page 14: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising
Page 15: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

5.1 My Comments on Privacy Preserving Convex Hull

5.1.1 The beauty of choosing yj+1 as tangential point if yj, yj+1, xc are collinear I find it fascinating that just saying “choose yj+1 as the tangential point if yjyj+1 is collinear with xc” has different implications in terms of whether the closer or the further point on the tangent is chosen, but in every case the correct point is selected (because the convex hull is built in a clockwise fashion). Although many cases are possible, I demonstrate this here by contrasting two cases:

For cases 1 and 2:

Consider the black lines and points to be the final convex hull under construction. xc is the last point selected for the final hull from X’s private convex hull. T1 and T2 are the tangential lines drawn by Y to determine a candidate from Y’s convex hull Py for the next point. Also, assume to simplify discussion, that the candidate point from Y is going to win over X’s candidate (to get their point chosen for the final convex hull). …Yi, yi+1… y’i, y’i+1.. are points that occur on Y’s convex hull in the clockwise direction, respectively.

Case 1:

Since T1 has a greater angle of inclination we know that the tangential point selected as a candidate will be on T1.

For T1: The algorithm will choose Yi+1

over Yi, which ensures that the further and hence more inclusive point is chosen.

For T2: The algorithm will choose Yi’+1

as the tangential point. This is correct despite being the point closer to xc, since the choice of point on T2 is immaterial – the point that is selected on this tangent is not going to make to the final convex hull anyway.

Case 1

Page 16: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Case 2:

Since T1 has a greater angle of inclination we know that the tangential point selected as a candidate will be on T1:

For T1: The algorithm will choose Yi’+1 over Yi’, which ensures that the further and hence more inclusive point is chosen.

For T2: The algorithm will choose Yi+1 as the tangential point. This is correct despite being the point closer to xc since the choice of point on T2 is immaterial the point that is selected on this tangent is not going to make to the final convex hull anyway.

5.1.2 Minor technical flaw in the algorithm The case when a player is drawing tangents from the last selected point (which is on the other player’s private convex hull) to her/ his own convex hull and that point is collinear with two edges on her/ his private convex hull has been left out. In other words, the case that Y draws tangents from xc and both of the tangents have 2 points of Py on them was left. However, the strategy presented to deal with one such tangent extends elegantly to cover the case with two such tangents.

5.1.3 The existence of a O(log |P|) algorithm for finding tangents The paper does not go into any detail about how the O(log |Py|) algorithm would work to find the tangents from a given point to a convex polygon (which happens to be the private convex hull of Y in our case). This was quite intriguing for me. Initially it seemed to me that an algorithm for finding tangents had to be at least linear time, but since it was used in the research paper I tried to create one with log |P| running time. Based on some concepts related to intersections of lines, to which side of a line does a point lie etc that I found by searching on the internet, I was able to created one. It is presented in the Appendix B.

5.1.4.1 Problems with usage of Yao’s Protocol Yao’s protocol is abstracted away by the use of COMPARE(a,b). We need to either replace Yao’s protocol by another protocol for COMPARE(a,b) or modify the usage to suit our needs. Actually, COMPARE(a,b) is used in two slightly different ways in Privacy Preserving Convex Hull Protocol.

Comparison of distances:

Yao’s protocol is designed to work for integers in a certain range. Due to lack of any special information about the distances it makes sense in my opinion to assume that distances are real numbers. Since the players A and B (or X and Y) have no information about each others’ points beforehand, there is no agreed range known to both. This is a problem because Yao’s protocol works for i,j being in a certain range (1<i,j<10 was used in the paper).

Case 2

Page 17: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Comparison of cosines of angles:

Again, we have the problem of having our cosines being real numbers. However we have a little more information that we can exploit in this case: the fact that the cosine of any angle is between 0 and 180° is in [1,-1].

5.1.4.2 How to alleviate problems discussed in 5.1.4.1 Let’s tackle the easier one first:

Comparison of cosines of angles:

For the cosines we can add 1 to each cosine value, thus shifting the range to [0,2] (or some such convenient range- although in principle this is not a big deal). Then X and Y compare the integer part of their real numbers (The range for Yao’s protocol in this case becomes [0, 2]). If they have the same digit (last step was inconclusive), they compare the digit at the 1st decimal place (this automatically gives a range of [0, 9+ for Yao’s protocol). If the result is still inconclusive they compare the digit at the next decimal place and so on… until:

1. They can conclude whose cosine value is greater OR

2. A predetermined maximum number of decimal place comparisons have been done. Then they can choose to make an exception and share the co-ordinates of their candidate points for this iteration of the xc choosing process.

Comparison of distances:

A and B will need to agree upon a maximum distance7 8or maximum number of digits9 before the decimal places for each distance comparison, in addition to everything already mentioned in Comparison of Cosines of angles above. After that they can just use the technique described for the cosines of angles. This inherently implies that the one proposing a maximum range is giving out some information about his distance and so is the one accepting or rejecting it. This may or may not be acceptable based on how paranoid the parties are about giving out information.

Cost of above modifications

The looping mechanism checking each digit from most significant to the least significant (in case of equality for all previous digits) is not so bad probabilistically. The probability of both players getting the same digit in [0,x] assuming each to be equally likely is 1/x. Hence for cosine comparison:

P(1 comparison will suffice) = 2/3. P(2 comparisons will be needed) = (1/3)*(1/10) = 1/30 P(3 comparisons will be needed) = (1/3)*(1/10)*(1/10) = 1/300

And so on. A similar analysis can be done for the comparison of distances.

7 Giving a specific [0, some number] range for at least the most significant digit and the default range of [0, 9] for

all smaller places. 8 Worst case - we can use the circumference of earth /2 =~ 20000 km if the points are on earth ;)

9 Giving a range of [0,9] for each position.

Page 18: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

6.0 A Heuristic approach to determine the nearest pair of points Some important points about the algorithm:

It does not guarantee finding the optimal solution.

It is only designed for the case when A and B (the sets of points of individual players) are mutually exclusive: i.e. a (any) line can be drawn such that all points ∈ A are on one side of it and all points ∈ B are on the other side.

A good solution is considered to be a pair of points (p1, q1) such that |p’q’| - |p1q1| is as small as is possible at a certain iteration in the algorithm.

It does not fulfill the ideal condition that only the coordinates of the nearest pair be exchanged, but it attempts to minimize the number of points whose coordinates must be shared.

DETERMINE-TENTATIVE-SHORTEST-DISTANCE DO{

1. PA performs DRAW-BOUNDING-CIRCLE(A) and calls it CA 2. PB performs DRAW-BOUNDING-CIRCLE(B) and calls it CB

(Order of 1, 2 does not matter and can be done simultaneously) 3. PA and PB exchange the equations of the circles CA and CB 4. PA and PB both:

a. Determine The line segment L joining the centres of CA and CB b. Find the points BCA and BCB

10 – points where L intersects CA and CB respectively.

(One player could do this entire calculation and share with the other for the sake of minimizing computation)

5. PA determines the point ∈ A which is closest to BCA and calls it TNA 6. PB determines the point ∈ B which is closest to BCB and calls it TNB

7. PA and PB exchange the co-ordinates of TNA and TNB

8. They compute |TNA TNB|. This is their tentative shortest distance. They call it TSD. 9. PA calls ELIMINATE-POINTS(A, CB, TSD) 10. PB calls ELIMINATE-POINTS(B, CA, TSD)

} WHILE !( |A|==1 and |B|==111 OR both A & B are stuck OR PA is stuck and |B|== 1 OR PB is stuck and |A|==1 OR PA and PB are content with a pair with distance TSD) :OUTPUT = TNA, TNB, TSD ELIMINATE-POINTS(X, Circle, TSD)

11. For each x ∈ X 12. If SHORTEST-DISTANCE-TO-BOUNDARY-OF(Circle, x) ≥ TSD 13. X = X – {x} //x is eliminated

SHORTEST-DISTANCE-TO-BOUNDARY-OF(Circle, x)

14. Return (|x,Centre of Circle| - radius of Circle) DRAW-BOUNDING-CIRCLE(X)

15. Among all pairs of points in X, select the pair (a,b) that is furthest apart viz. any other pair 16. Find the mid point on the line segment joinging a and b. Call it c.

10

BC stands for Border Crossing. 11

This would mean the tentative solution is the optimal solution.

Page 19: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

17. For each point p∈X find the one that is most distant from c. Call this distance Radius. 18. Draw a Circle (or simply determine equation of) a circle with centre as c and radius as

Radius.

6.1 A Brief Explanation We make circles around both sets of points A and B. These circles have all the points either inside or on them. A path from a point ∈A will have to cross the circle around points in B in order to reach any of them. Hence the shortest distance from a point on A to the circle around B represents a lower bound on the distance from that point to any point in B. The reason for determining BCA and BCB is that |BCA BCB| represents the shortest distance between the two circles. Hence it is reasonable to find points closest to these points to serve as a tentative nearest pair. TSD is the distance between tentative nearest pair. Since at any iteration of DETERMINE-TENTATIVE-SHORTEST-DISTANCE we are aiming to get a pair of points such that their separation is less than TSD, there is no reason to keep any point whose separation from every point of the other player will exceed TSD. This is done in ELIMINATE-POINTS(X, Circle, TSD).

6.2 The iterative part and associated problem The elimination process in ELIMINATE-POINTS(X, Circle, TSD) does not guarantee that any points will be removed. It is possible that the if condition in line 12 does not hold true for any point for a player. This will mean that he or she is stuck. There is room for improvement here. I leave this as an open exercise. A point that may be useful while considering this is:

The comparison in line 12 depends on the centre and radius of the other player’s circle. If the other player agrees to generate another bounding circle, thereby changing the test in line 12, it may be possible for the player in question to eliminate some more points.

However it is possible that when PA gets stuck because she cannot eliminate any points, PB is able to find a newer TNB with the reduced set B in the next iteration. This new reduced TSD can possibly allow PA to eliminate points in the next iteration. So the algorithm would be really stuck only when both parties are stuck (unable to eliminate points) or when one is stuck and one is left with only one point.

Termination The case of both players being left with only one point would represent the attainment of the optimal pair. This would be the ideal termination. The players can also decide to terminate if they think that they have a good enough pair or they are unwilling to share more points or unwilling to do more computation.

6.3 Bounding Circle The approach I have used to determine was only chosen because it was the easiest to implement and

not a core aspect of the problem, in my opinion. There can possibly be many more ways to determine a

bounding circle. In fact, most of the arguments will apply to any other shape that encloses the point sets

as well. However the computation involved may get harder.

Page 20: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

6.4 How to obtain a lower bound on the actual distance between optimal

solution While the distance between BCA and BCB is the absolute shortest distance possible between any pair

after both the parties have drawn their circles, we can do slightly better.

Figure 3

In Figure 3 the dark regions in both circles represent where the remaining points can be. The light regions represent the regions in which all the points are eliminated. TNA and TNB are on the arcs ii and v respectively. The radius of the small white circles represents the distance between BCA and TNA with centre at BCA (and similarly for B). No points can exist in the white zones marked by arcs ii and v because of the way we choose the points TNA and TNB (they have to be the closest points to the border crossing points). Although there can be points on the arcs ii and v themselves [It is possible to have more than one point in A (or B) that is at the same distance from BCA (or BCB)] Now we have to estimate the least possible distance between two points in the dark zones:

For any two points that are in the interiors of the dark zones, we can draw a line segment. Since the parties do not actually know where each others’ points are, they must try to estimate the smallest such distance. Any such line segment can shortened from both sides until it touches one of the arcs i, ii, iii on one side and one of the arcs iv, v, vi on the other side. This implies that we will get our lower bound from points on the arcs, not in the dark pink/ blue zones.

(Please see figure 4)Let’s name the corner points as N1, N2, N3, N4. N1 is the point of intersection of arcs i and ii. N2 is the point of intersection of arcs ii and iii. N3 is the point of intersection of arcs iv and v. N4 is the point of intersection of arcs v and vi.

Page 21: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

My claim is that the lengths |N1N3| = |N2N4| represent the lower bound on the distance between any two points from A and B. Let’s say the distance between the points N1 and N3 is d. If we put the needle of a compass on N1 and draw an arc (Arc 1) that passes through N4 it cuts the arc v in N4 and another point i.e it cuts arc v in two points. If we retain N1 as the centre and keep increasing the width of the compass we keep getting arcs/ circles that cut arc v in two points (Arcs 2,3)..until Arc 4 which only touches the Arc v at one point. However if we go in the opposite direction i.e. start drawing arcs with centre as N1 but radius less than Arc 1, we get arcs (Arc 5, 6 for example) that cut arc v in only one point12. Now we can draw smaller and smaller arcs until we get Arc 7 which has length = |N1N3| with the centre as N1. Since Arc 7 only cuts arc v at N3, N3 is the closest point to N1 on arc v. Now looking at the Arcs 1 to 7 cutting arc iv: All Arcs with radius greater than Arc 7 cut arc iv and go through its interior, however Arc 7 cuts arc iv in only one point – N3. Hence N3 represents the single point in B that is closest to N1.

Figure 4

We don’t even need to consider arc vi as clearly only Arc 1 and bigger arcs can cut it. Now we turn this rather narrow claim into something bigger by considering these points:

The proof did not use the actual position of N1 wrt arcs i and ii in the argument. It was only used as the needle point of the compass. The same argument will hold true for any other point on the arcs i, ii or iii.

12

If we extend the arc v to draw the complete circle, Arc 5 (and 6) will cut it at another point, but not on arc v itself.

Page 22: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Only if we take as centre a point on the line joining the centres of the bounding circles, we will get arcs that cut both N3 and N4 and do not cut arcs iv, v, vi anywhere else. Even so, it will represent the least distance from that point (possibly on arc ii) to N1(and N3).

By symmetry of the lower half, an equivalent lower bound is |N2N4|

The same argument also applies from the opposite side i.e. drawing arcs with N3 as the centre and finding the shortest one is the one that passes through N1.

This means that changing either of the points of the line segment to be any other point than N1 in/on A’s bounding circle and N3 on B’s bounding circle will increase the length of the segment. This implies that the length |N1N3| = |N2N4| =d represents a lower bound on the distance between the nearest pair. Since PA and PB already know the equations of each others’ circle and the points BCA, BCB, TNA and TNB they can easily compute N1, N2, N3, N4 and hence determine the lower bound. This can help them decide whether they want to stop the iterations/ help them realize how much worse off they are from the lowest possible optimal solution if the iterations get stuck. An important point to note is that this lowest bound depends on the bounding circles and tentative nearest pair selected. It can change at every iteration.

Page 23: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Appendix A

Some technical points about Yao’s protocol are: 1. In the initial statement of the problem the limits on i, j are described as “1<I,j<10”. This should

have been “1≤ i, j≤ 10” as the protocol will work for i and j from 1 to 10, both inclusive. 2. How well Bob can guess i depends heavily on what the values i and j are, no matter how secure

the protocol is. For instance if j is 10 and he concludes i≥j in step 6. Then he immediately knows that i=10. An interesting observation is that if j is close to one extreme let’s say 1(or 10) and i < j (or i≥j), then Bob has a good chance of guessing i, but if i ≥ j (or i<j) then he doesn’t have as good a chance of guessing i.

3. When Bob sends k-j+1 = Ea(x)-j+1. He must be cautious about what k value is obtained, if it is too small then it may be necessary to choose a new k. The reason is as follows:

Specific example: k =0, j=5. Bob sends 0-5+1 = -4. Alice knows about the +1 part so she knows (k-j=-5). She knows that the domain of Ea () is [0, 2N-1] so the minimum value of k is 0.Therefore the only possibilities for (k,j) so that k-j=-5 are: (0,5), (1,6),(2,7),(3,8),(4,9),(5,10). In general: if Alice receives a negative number c, then she knows j≥-c.

4. It is required that the choice of p should be such that the Zu’s differ by at least 2 in the mod p sense. Since u=10, this implies that we need at least 20 congruence classes. This means that p has to be >20. The smallest prime greater than 20 is 23. The binary representation of 23 takes a minimum of 5 bits. By combining this argument and the condition in the paper that p has to be a N/2 bit number, we can infer that this protocol will not work for (N/2)<5 or N<10(or 11) if we follow the wording literally.

Page 24: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Appendix B

Log|P| algorithm for finding tangents

FIND-TANGENTS(xc, Py)

LEFT-TANGENT, RIGHT-TANGENT = NULL LEFT-TANGENTIAL-POINT, RIGHT-TANGENTIAL-POINT = NULL LIST-OF-POINTS-ON-LEFT, LIST-OF-POINTS-ON-RIGHT = EMPTY Choose a random point yinit on Py

Draw a line Linit that passes through xc and Yinit. For each point yi on Py { If (yi is to left of yinit)

{ add yi to LIST-OF-POINTS-ON-LEFT

} else if (yi is to right of yinit)

{ add yi to LIST-OF-POINTS-ON-RIGHT

} else // yi is on Linit

{ add yi to LIST-OF-POINTS-ON-LEFT add yi to LIST-OF-POINTS-ON-RIGHT }

} (LEFT-TANGENT, LEFT-TANGENTIAL-POINT) = GO-OUTWARD (Linit, LIST-OF-POINTS-ON-LEFT, LEFT, xc) (RIGHT-TANGENT, RIGHT-TANGENTIAL-POINT) = GO-OUTWARD (Linit, LIST-OF-POINTS-ON-RIGHT, RIGHT, xc)

END

Page 25: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

(Line, point) GO-OUTWARD (L, LIST-OF-POINTS, DIRECTION, xc)

Choose random point yr from LIST-OF-POINTS

POSSIBLE-TANGENTIAL-POINT= yr Draw a line L2 passing through xc and yr

For each point ys in LIST-OF-POINTS { If (ys is in the direction opposite to DIRECTION of L2 w.r.t L)13 Remove ys from LIST-OF-POINTS Else if (ys is on L2) { IF(s>r) POSSIBLE-TANGENTIAL-POINT = ys

} } If (LIST-OF-POINTS is EMPTY)

{ RETURN (L2, POSSIBLE-TANGENTIAL-POINT) }

RETURN (GO-OUTWARD (L2, LIST-OF-POINTS, DIRECTION, xc))

END

13

i.e. discard all the points to the right of L2, if we are going left-ward to find the tangent. Similarly, discard all the points to the left of L2, if we are going right-ward to find the tangent.

Page 26: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

Appendix C

The implementation program was written using C#. In the present state it is only expected to work on

Windows XP or later as it uses the .Net framework. The Application can directly be run by double clicking

the NearestPairAlgorithm.exe file on top level in the submitted zip file.

Steps to use the application 1. When the application launches click on left half pane to select points for one player and right

half for the other player

2. After all points for a player are marked click the “Tell Player Name to perform local

computation” button.

3. After step 2 is done for both players, click “Tell players A and B to share points” button.

4. Repeat steps 1 to 3

Other features 5. The text box in the bottom right displays the TSD obtained for each iteration.

*This box doubles as a distance measurement tool. Clicking the “Calculate Distance b/w 2

points” button and then clicking two buttons on the screen shows the distance between the two

points in the box.]

6. TNA and TNB are marked black.

Page 27: Honours Project Report - Carleton University...Honours Project Report COMP 4905 Select topics from Private Computation and Private Computational Geometry ... topics and for supervising

7. Removed points are shown in gray color. Optionally deleted points can be removed from

display, instead of discoloring them. This can be done by locating the line:

private const bool showDeletedPoints = true; in the file

Player.cs and replacing true by false and rebuilding the solution (instructions below)

Steps to build the Application again from source code: 1. The zip file contains a folder NearestPairAlgorithm which contains a file called

NearestPairAlgorithm.sln. Open this file with Visual Studio 2005 or later.

2. Click Build → Build Solution from the top menu bar.