Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Linear Programming and Clustering
Advisor: Dr. Leonard Schulman, Caltech
Aditya HuddedarIIT Kanpur
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Outline of Talk
Linear Programming1 Introduction2 Motivation3 Our Approach4 A possible counter-example
Clustering1 Introduction2 Observations
Conclusion and Open Problems
References
Acknowledgement
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Linear Programming
Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationshipsLinear programs are problems that can be expressed incanonical form:
maximize c .xsubject to Ax ≤ b
where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Linear Programming
Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationships
Linear programs are problems that can be expressed incanonical form:
maximize c .xsubject to Ax ≤ b
where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Linear Programming
Linear programming (LP) is a mathematical method fordetermining a way to achieve the most suitable outcome (suchas maximum profit or lowest cost) in a given mathematicalmodel for some list of requirements represented as linearrelationshipsLinear programs are problems that can be expressed incanonical form:
maximize c .xsubject to Ax ≤ b
where x represents the vector of variables (to be determined),c ∈ Rn and b ∈ Rm are vectors of (known) coefficients andA ∈ Rm×n is a (known) matrix of coefficients.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-
1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division
2 the number of such steps is polynomially bounded in thedimension of the input
3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-
1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division
2 the number of such steps is polynomially bounded in thedimension of the input
3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-
1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division
2 the number of such steps is polynomially bounded in thedimension of the input
3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Strongly Polynomial Time Algorithm: An algorithm is stronglypolynomial if:-
1 it consists of the elementary arithmetic operations: addition,comparison, multiplication and division
2 the number of such steps is polynomially bounded in thedimension of the input
3 when the algorithm is applied to rational input, then the size ofthe numbers during the algorithm is polynomially bounded inthe dimension of the input and the size of the input numbers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
Can we solve a standard instance of LP using at most f (m, n)arithmetic operations with f being a bounded-degree polynomialwith no dependence on the description of A, b, c?
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Motivation
Simplex Approach and Interior Points Algorithms
Existing algorithms work in polynomial time but not stronglypolynomial.
It has been an open question for a long time.
Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Motivation
Simplex Approach and Interior Points Algorithms
Existing algorithms work in polynomial time but not stronglypolynomial.
It has been an open question for a long time.
Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Motivation
Simplex Approach and Interior Points Algorithms
Existing algorithms work in polynomial time but not stronglypolynomial.
It has been an open question for a long time.
Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Motivation
Simplex Approach and Interior Points Algorithms
Existing algorithms work in polynomial time but not stronglypolynomial.
It has been an open question for a long time.
Recently, Vempala-Barasz proposed an approach to stronglypolynomial linear programming.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
Input : Polyhedron P given by linear inequalities{aj .x ≤ bj : j = 1....m}, objective vector c and a vertex z .
Output : A vertex maximizing the objective value, or“unbounded” if the LP is unbounded.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
Input : Polyhedron P given by linear inequalities{aj .x ≤ bj : j = 1....m}, objective vector c and a vertex z .
Output : A vertex maximizing the objective value, or“unbounded” if the LP is unbounded.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
while The current vertex z is not optimal doH= the set of indices of active inequalities at z .For every t ∈ H, compute a vector vt : ah.vt = 0 for h ∈ H \ tand at .vt < 0.T = {t ∈ H : c .vt ≥ 0} and S = H \ T .while T 6= φ do
Perform step 2
Return the current vertex.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Step 2
(a) For every t ∈ T , compute a vector vt 6= 0 : ah.vt = 0 forh ∈ H \ {t}, c.vt ≥ 0 and the length of vt is the largest value forwhich z + vt is feasible.(b) Compute a non-negative combination v of {vt : t ∈ T}.(c) Let λ be maximal for which z + λv ∈ P, if no such maximumexists, return “unbounded”. z := z + λv(d) Let s be the index of an inequality which becomes active. Lett ∈ T be any index such that {ah : h ∈ {s} ∪ S ∪ T \ {t}} islinearly independent. Set S := S ∪ {s}, T := T \ {t} andH := S ∪ T .
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.
We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.
The only necessary condition is that all the coefficients shouldbe non-negative.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.
We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.
The only necessary condition is that all the coefficients shouldbe non-negative.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
The Affine-Invariant Algorithm
Note : If we are able to show a polynomial bound on theVempala-Barasz algorithm , it is a strongly polynomialalgorithm.
We can prove that the method of choosing coefficients doesnot affect the analysis of the algorithm in Klee-Minty case.We can, in fact, fix the coefficients rather than using randomor centroid method.
The only necessary condition is that all the coefficients shouldbe non-negative.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Possibilities
1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).
2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.
3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Possibilities
1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).
2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.
3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Possibilities
1 The algorithm works in strongly polynomial time for all the LPcases, no matter how we combine the improving rays(ofcourse using non-negative coefficients).
2 The algorithm works in strongly polynomial time for all LPcases, but only when we combine the improving rays in aparticular fashion. So, if possible an adversary can come upwith choices at each step to ensure the algorithm takesexponential many steps.
3 There may exist an LP problem for which the algorithm doesnot work in polynomial time.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.
We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.
In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.
We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.
In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In Klee-Minty or Goldfarb-Sit cases, the algorithm visits afacet of the polytope only once making the algorithm run instrongly polynomial time.
We attempt to construct a counter-example in which thealgorithm visits some of the facets more than once. The top(left figure) and side (right) views of a polytope are presentedin the figures in the next slide.
In each step of the algorithm, we choose to move along onlyone of the possible improving rays, rather than taking anycombination of all of them.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
Objective vector
3
1
2
4
0
56
7
a
b
c
d
ef
X
Y
Z
W
a
d
e
b
c
f
0
1
2
4
3
5
67
XU
Y
Z
W
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
Let the starting point be a. We have H = {X ,Y ,Z}.
Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .
In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
Let the starting point be a. We have H = {X ,Y ,Z}.Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .
In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
Let the starting point be a. We have H = {X ,Y ,Z}.Running step 1 of the algorithm, we get T = {X ,Y },S = {Z} .
In the first iteration of step 2, we deterministically choose tomove along 2 to reach vertex b. Now, the modified sets are :-T = {Y }, S = {U,Z} and H = T ∪ S .
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .
Now, T becomes empty and we have reached the next vertexc .
We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.
Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .
Now, T becomes empty and we have reached the next vertexc .
We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.
Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .
Now, T becomes empty and we have reached the next vertexc .
We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.
Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
A Possible Counter-example
In the next iteration of step 2, we deterministically choose tomove along 4 to reach c .
Now, T becomes empty and we have reached the next vertexc .
We repeat similar steps from c also to move along 5 first andthen 7 to reach e as the next vertex.
Thus, we visit front facet and back facet alternately (morethan once) in the algorithm.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.
An optimum clustering is the one which has the minimumcost of clustering.
Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.
An optimum clustering is the one which has the minimumcost of clustering.
Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Introduction
In statistics and data mining, k-means clustering is a methodof clustering, which aims to partition n observations into kclusters in which each observation belongs to the cluster withthe nearest mean.
An optimum clustering is the one which has the minimumcost of clustering.
Cost of clustering is defined as ΣC∈SΣx∈C ||x − ctr(C )||2where ctr(C ) is the center of cluster C and S is the set of allcluster-centers.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.
We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.
We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.
We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.
We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
Frequently used method to achieve this, is to start with somerandom k initial centers (called seeds) and use Lloyd’siterations over these clusters to move the centers to decreasethe cost of clustering.
We review a paper “The effectiveness of Lloyd-type Methodsfor the k-Means Problem” , to study a novel probabilisticseeding process for the starting configuration of a Lloyd-typeiteration.
We implement the seeding algorithm described in the paperand compare it with the standard seeding methods.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.
We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.
We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.
We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.
We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Our Approach
We use the builtin program in R (Project for StatisticalComputing) as the standard to compare our result against.
We particularly focus on the data sets whose clustering isverifiable e.g. Iris data set.
We compare the clustering obtained by us against the onealready provided and observe that almost 95% of thedata-items are classified correctly using just the seeds withoutany Lloyd’s iterations.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Observations
1-100 Data set Iris data set
Value of k 5 3
Cost with built-in program 3375.0 78.93
Cost with just the seeds as centers 3598.0 78.11
Cost with built-in-with-seeds 3375.0 78.93
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Observations
Cloud-1 data set
Value of k 10 15 20
Cost with 6302655.8 5073931.9 3188105.4built-in program
Cost with just 6843269.9 4945501.7 4501944.9the seeds as centers
Cost with 6303132.5 3658169.9 2683026.6built-in-with-seeds
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Observations
Glass data set
Value of k 6
Cost with built-in program 336.27
Cost with just the seeds as centers 417.1
Cost with built-in-with-seeds 336.2
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Conclusion
We rule out the technique used by Vempala-Barasz to provepolynomial bound on their algorithm for a general LP problem.
Based on our study of clustering, we can conclude that Lloyd’siterations might not be needed if the initial seeds are goodenough, which is the case in most of the examples we studied
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Conclusion
We rule out the technique used by Vempala-Barasz to provepolynomial bound on their algorithm for a general LP problem.
Based on our study of clustering, we can conclude that Lloyd’siterations might not be needed if the initial seeds are goodenough, which is the case in most of the examples we studied
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Open Problems
Strongly Polynomial Time algorithm for LP is still an openproblem.
If we use efficient seeding, can we get rid of Lloyd’s iterationsor can we do it with just one Lloyd’s iteration ?
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Open Problems
Strongly Polynomial Time algorithm for LP is still an openproblem.
If we use efficient seeding, can we get rid of Lloyd’s iterationsor can we do it with just one Lloyd’s iteration ?
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
References
1 A New Approach to Strongly Polynomial Linear Programming,Mihaly Barasz and Santosh Vempala
2 A strongly polynomial algorithm to solve combinatorial linearprograms, Eva Tardos
3 Linear Programming, Howard Karloff
4 Linear Programming, Vasek Chvatal
5 Wiki-page for Linear Programming
6 The effectiveness of Lloyd-type Methods for the k-MeansProblem, Rafail Ostrovsky, Yuval Rabani, Leonard J.Schulman, Chaitanya Swamy.
7 http://archive.ics.uci.edu/ml/
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering
Outline of TalkLinear Programming
ClusteringConclusion and Open Problems
ReferencesAcknowledgement
Acknowledgement
I would like to thank Dr. Leonard Schulman for his valuableguidance.
I would also like to thank Dr. Chris Umans for his guidanceon the multivariate polynomial interpolation problem.
I would like to thank SURF committee for giving me thisopportunity to present my work.
Advisor: Dr. Leonard Schulman, Caltech Aditya Huddedar IIT KanpurLinear Programming and Clustering