linear programming and clustering - iitkhome.iitk.ac.in/~adityah/surf.pdf · outline of talk linear...

Post on 16-Apr-2018

226 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Linear Programming and Clustering

Advisor: Dr. Leonard Schulman, Caltech

Aditya HuddedarIIT Kanpur

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Outline of Talk

Linear Programming1 Introduction2 Motivation3 Our Approach4 A possible counter-example

Clustering1 Introduction2 Observations

Conclusion and Open Problems

References

Acknowledgement

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Linear Programming

Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationshipsLinear programs are problems that can be expressed incanonical form:

maximize c .xsubject to Ax ≤ b

where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Linear Programming

Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationships

Linear programs are problems that can be expressed incanonical form:

maximize c .xsubject to Ax ≤ b

where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Linear Programming

Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationshipsLinear programs are problems that can be expressed incanonical form:

maximize c .xsubject to Ax ≤ b

where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-

1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division

2 the number of such steps is polynomially bounded in thedimension of the input

3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-

1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division

2 the number of such steps is polynomially bounded in thedimension of the input

3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-

1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division

2 the number of such steps is polynomially bounded in thedimension of the input

3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-

1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division

2 the number of such steps is polynomially bounded in thedimension of the input

3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

Can we solve a standard instance of LP using at most f (m, n)arithmetic operations with f being a bounded-degree polynomialwith no dependence on the description of A, b, c?

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Motivation

Simplex Approach and Interior Points Algorithms

Existing algorithms work in polynomial time but not stronglypolynomial.

It has been an open question for a long time.

Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Motivation

Simplex Approach and Interior Points Algorithms

Existing algorithms work in polynomial time but not stronglypolynomial.

It has been an open question for a long time.

Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Motivation

Simplex Approach and Interior Points Algorithms

Existing algorithms work in polynomial time but not stronglypolynomial.

It has been an open question for a long time.

Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Motivation

Simplex Approach and Interior Points Algorithms

Existing algorithms work in polynomial time but not stronglypolynomial.

It has been an open question for a long time.

Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

Input : Polyhedron P given by linear inequalities{aj .x ≤ bj : j = 1....m}, objective vector c and a vertex z .

Output : A vertex maximizing the objective value, or“unbounded” if the LP is unbounded.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

Input : Polyhedron P given by linear inequalities{aj .x ≤ bj : j = 1....m}, objective vector c and a vertex z .

Output : A vertex maximizing the objective value, or“unbounded” if the LP is unbounded.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

while The current vertex z is not optimal doH= the set of indices of active inequalities at z .For every t ∈ H, compute a vector vt : ah.vt = 0 for h ∈ H \ tand at .vt < 0.T = {t ∈ H : c .vt ≥ 0} and S = H \ T .while T 6= φ do

Perform step 2

Return the current vertex.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Step 2

(a) For every t ∈ T , compute a vector vt 6= 0 : ah.vt = 0 forh ∈ H \ {t}, c.vt ≥ 0 and the length of vt is the largest value forwhich z + vt is feasible.(b) Compute a non-negative combination v of {vt : t ∈ T}.(c) Let λ be maximal for which z + λv ∈ P, if no such maximumexists, return “unbounded”. z := z + λv(d) Let s be the index of an inequality which becomes active. Lett ∈ T be any index such that {ah : h ∈ {s} ∪ S ∪ T \ {t}} islinearly independent. Set S := S ∪ {s}, T := T \ {t} andH := S ∪ T .

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.

We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.

The only necessary condition is that all the coefficients shouldbe non-negative.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.

We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.

The only necessary condition is that all the coefficients shouldbe non-negative.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

The Affine-Invariant Algorithm

Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.

We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.

The only necessary condition is that all the coefficients shouldbe non-negative.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Possibilities

1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).

2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.

3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Possibilities

1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).

2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.

3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Possibilities

1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).

2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.

3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.

We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.

In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.

We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.

In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.

We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.

In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

Objective vector

3

1

2

4

0

56

7

a

b

c

d

ef

X

Y

Z

W

a

d

e

b

c

f

0

1

2

4

3

5

67

XU

Y

Z

W

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

Let the starting point be a. We have H = {X ,Y ,Z}.

Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .

In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

Let the starting point be a. We have H = {X ,Y ,Z}.Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .

In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

Let the starting point be a. We have H = {X ,Y ,Z}.Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .

In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .

Now, T becomes empty and we have reached the next vertexc .

We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.

Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .

Now, T becomes empty and we have reached the next vertexc .

We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.

Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .

Now, T becomes empty and we have reached the next vertexc .

We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.

Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

A Possible Counter-example

In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .

Now, T becomes empty and we have reached the next vertexc .

We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.

Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.

An optimum clustering is the one which has the minimumcost of clustering.

Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.

An optimum clustering is the one which has the minimumcost of clustering.

Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Introduction

In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.

An optimum clustering is the one which has the minimumcost of clustering.

Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.

We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.

We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.

We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.

We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.

We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.

We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.

We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.

We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.

We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.

We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Our Approach

We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.

We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.

We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Observations

1-100 Data set Iris data set

Value of k 5 3

Cost with built-in program 3375.0 78.93

Cost with just the seeds as centers 3598.0 78.11

Cost with built-in-with-seeds 3375.0 78.93

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Observations

Cloud-1 data set

Value of k 10 15 20

Cost with 6302655.8 5073931.9 3188105.4built-in program

Cost with just 6843269.9 4945501.7 4501944.9the seeds as centers

Cost with 6303132.5 3658169.9 2683026.6built-in-with-seeds

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Observations

Glass data set

Value of k 6

Cost with built-in program 336.27

Cost with just the seeds as centers 417.1

Cost with built-in-with-seeds 336.2

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Conclusion

We rule out the technique used by Vempala-Barasz to provepolynomial bound on their algorithm for a general LP problem.

Based on our study of clustering, we can conclude that Lloyd’siterations might not be needed if the initial seeds are goodenough, which is the case in most of the examples we studied

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Conclusion

We rule out the technique used by Vempala-Barasz to provepolynomial bound on their algorithm for a general LP problem.

Based on our study of clustering, we can conclude that Lloyd’siterations might not be needed if the initial seeds are goodenough, which is the case in most of the examples we studied

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Open Problems

Strongly Polynomial Time algorithm for LP is still an openproblem.

If we use efficient seeding, can we get rid of Lloyd’s iterationsor can we do it with just one Lloyd’s iteration ?

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Open Problems

Strongly Polynomial Time algorithm for LP is still an openproblem.

If we use efficient seeding, can we get rid of Lloyd’s iterationsor can we do it with just one Lloyd’s iteration ?

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

References

1 A New Approach to Strongly Polynomial Linear Programming,Mihaly Barasz and Santosh Vempala

2 A strongly polynomial algorithm to solve combinatorial linearprograms, Eva Tardos

3 Linear Programming, Howard Karloff

4 Linear Programming, Vasek Chvatal

5 Wiki-page for Linear Programming

6 The effectiveness of Lloyd-type Methods for the k-MeansProblem, Rafail Ostrovsky, Yuval Rabani, Leonard J.Schulman, Chaitanya Swamy.

7 http://archive.ics.uci.edu/ml/

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

Outline of TalkLinear Programming

ClusteringConclusion and Open Problems

ReferencesAcknowledgement

Acknowledgement

I would like to thank Dr. Leonard Schulman for his valuableguidance.

I would also like to thank Dr. Chris Umans for his guidanceon the multivariate polynomial interpolation problem.

I would like to thank SURF committee for giving me thisopportunity to present my work.

Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering

top related