almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · almost...

16
Almost optimal solutions to k-clustering problems ? Pankaj Kumar 1 and Piyush Kumar 2 1 Purdue University 2 Florida State University (a) American Cities (b) Lyria + + + + + + + + (c) French Cities Fig. 1. Examples of k-center clustering in 2D and 3D for ε = 10 -5 . Abstract. We implement an algorithm for k-clustering for small k in fixed dimensions and report experimental results here. Although the theoretical bounds on the running time are hopeless for 1 + ε approximating k-clusters, we note that for dimensions 2 and 3, k-clustering is practical for small k ( k 4 ) and simple enough shapes. For the purposes of this paper, k is a small fixed constant. 1 Introduction Clustering a set of points has numerous applications [30, 34]. This paper explores k-clustering which is a variant of clustering problem. The k-center problem is a special case of the k-clustering problem. The k-clustering problem has many applications such as building hierarchies in graphics applications and facility location [28, 41, 45, 50]. Many special cases and variants of k-clustering have been well studied in theory [17, 20, 26, 31, 33, 44, 46]. k-clustering has also been studied previously for finding approximation algorithms [50], for projective clustering [1] and for defense related applications [52]. Recently, we have also used k-clustering for surface reconstruction applications [39]. Top-down hierarchies of different shapes such as points, triangles, balls and models are used in graphics and geometry [19, 28, 32, 45]. These hierarchies are created by first clustering the input into a few shapes (e.g. spheres) and then recursively optimizing ? Submitted Aug 2006. Revised Dec 2008. Second Revision: July 2009.

Upload: others

Post on 05-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

Almost optimal solutions to k-clustering problems?

Pankaj Kumar1 and Piyush Kumar2

1 Purdue University2 Florida State University

(a) American Cities (b) Lyria

bc

bcbc

bcbc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bcbc

bc

bc

bc

bcbc

bcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bcbcbcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bcbcbcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bcbc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bcbc

bcbcbc

bc

bc

bc

bcbcbcbc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bcbcbc

bc

bcbc

bcbcbc

+

+

+

++

+

+

+

(c) French Cities

Fig. 1. Examples of k-center clustering in 2D and 3D for ε = 10−5.

Abstract. We implement an algorithm for k-clustering for small k in fixed dimensions and report experimentalresults here. Although the theoretical bounds on the running time are hopeless for 1 + ε approximating k-clusters,we note that for dimensions 2 and 3, k-clustering is practical for small k ( k ≤ 4 ) and simple enough shapes. Forthe purposes of this paper, k is a small fixed constant.

1 Introduction

Clustering a set of points has numerous applications [30, 34]. This paper explores k-clustering which is avariant of clustering problem. The k-center problem is a special case of the k-clustering problem.

The k-clustering problem has many applications such as building hierarchies in graphics applicationsand facility location [28, 41, 45, 50]. Many special cases and variants of k-clustering have been well studiedin theory [17, 20, 26, 31, 33, 44, 46]. k-clustering has also been studied previously for finding approximationalgorithms [50], for projective clustering [1] and for defense related applications [52]. Recently, we have alsoused k-clustering for surface reconstruction applications [39]. Top-down hierarchies of different shapes suchas points, triangles, balls and models are used in graphics and geometry [19, 28, 32, 45]. These hierarchiesare created by first clustering the input into a few shapes (e.g. spheres) and then recursively optimizing

? Submitted Aug 2006. Revised Dec 2008. Second Revision: July 2009.

Page 2: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

on those shapes and the part of the input they contain (or are associated with). This computation requiressolving the problem of partitioning a given set of input points into clusters which optimize a given objectivefunction. In this paper we consider one such objective function for inputs lying in fixed dimension (we onlyconsider dimensions 2 and 3 in this paper but the results can be generalized to higher dimensions).

Before we define k-clustering formally, we define our notation. s is a shape such as a disk or a rectan-gle. τ is a set of transformations that is shape preserving (for example, a rectangle maps to a rectangle). Π

is the family of shapes generated by the application of τ on s. ρ : Π→ R is a measure defined on Π. Weoverload the function ρ : Πk→ R to denote the maximum measure of k shapes from Π. P is a set of points.MES(P) ⊂ Π denotes the minimum enclosing shape of a set of points P. We also overload ρ(P) to denoteρ(MES(P)).

Definition (k-clustering) : A k-clustering is a covering of P by B = {π1,π2, . . . ,πk} such that, πi ∈ Π

and ρ(π1) = ρ(π2) = . . . = ρ(πk) = ρ(B) is minimized over all choices of k shapes from Π. For example,consider the following situation: Given a set of cities in the plane, we want to locate k warehouses, so asto minimize the maximum distance of a city from its closest warehouse. This is the k-clustering problemwhere Π is the set of disks in the plane, τ is all the translations and dilations of a unit disk, and ρ(s) denotesthe radius of a disk in Π. In this problem, we want to cover the set of cities with k-disks of equal radius,such that the radius is minimum over all choices of k-disks that can cover the cities (See Figure 1). Thisspecific instance of k-clustering where Π is a set of disks and the input is a set of points, is also called thek-center problem [50]. Even such a simple instance of the k-clustering problem is not only NP-Hard, butalso NP-Hard to approximate within a approximation factor of 2. This can be shown by an easy reduction tothe minimum cardinality dominating set problem [37].

Fig. 2. 4 k-line-center clustering of a dataset from TSPLIB with ε = 10−5,n = 18512.

Another related problem is the k-line-center problem [3] (see Figure 2). Given a set of n points in theplane, we want to cover the input by k strips so that the maximum width of the strip is minimized. A strip inthe plane is the region lying between two parallel lines. This definition can be easily generalized to higherdimensions. This problem is also NP-Hard to solve or approximate [38] for any constant c > 0. k-line-centerproblem is an instance of the k-clustering problem where Π is the set of strips in the plane. Apart from other

2

Page 3: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

applications [3], this problem has applications in curve reconstruction from noisy point sets [13]. Both thek-center and k-line-center problems are easy to solve in two dimensions when k = 1 [4]. In fact they can besolved in O(n) and O(n logn) time respectively [18, 49]. In fact, an approximate solution to the 1-line-centerproblem can be found in almost linear time [12]. The problem becomes considerably harder as k increasessince the problem turns into an integer programming problem. Increase of dimension is another factor thatmakes the problem harder to solve.

Previous Work : We do not know of any attempts to implement a solution to the k-clustering problem (ex-cept when k = 1) [35, 53]. Most previous work related to k-clustering has been focused on variants of thek-center problem and the k-line-center problem [2, 3, 10, 20, 28, 35]. One particular variant of the prob-lem is the rectilinear k-center problem where the distance between the points is measured in the rectilinearmetric (l1 or l∞). This problem seems to have special structure that can be exploited to give very efficientalgorithms [6, 47] although these do not generalize for the general k-clustering problem. We are not awareof any attempts to solve the general k-clustering problem in practice either approximately or exactly. Evenin theory, the shapes that have been studied for the k-clustering problem does not seem to include ellipses.Our brief survey of previous work only includes k-center problem since our results are very closely relatedto this problem.

One of the most elegant approximation algorithms for k-center clustering is the 2-factor approximationalgorithm by Gonzalez [25] which can be made to run in O(n logk) time [21]. One of the fastest methods fork-center clustering in 2 and 3 dimensions is by Aggarwal and Procopiuc [2] which uses a dynamic program-ming approach to k-center clustering and whose running time is upper bounded by O(n logk)+(k/ε)O(k1−1/d).Another elegant solution to the k-center clustering problem was given by Badoiu et.al. [10]. This solution isbased on the existence of a small set of points in the input called core-sets.

Definition (Core-set) : Let P be a set of points (finite or infinite). Let S ⊆ P and BS = {π1,π2, . . . ,πk} ⊂Π

define the optimal k-clustering of S. S is called a core-set of P if a 1+ ε expansion of BS covers P. Note thatthis assumes a natural expansion operation defined over Π.

Solving the k-clustering problem on the core-set gives an approximate solution to the original optimiza-tion problem. The method based on core-sets [10] was further refined recently [9, 35]. It seems that thecore-set methods are easier to implement than the dynamic programming methods [2, 29] and hence wechose to explore the core-set paradigm for computing k-clustering in this paper. The idea of using the fur-thest violator, adding it to a small set (whose solution is computed by an expensive algorithm) and repeatingis not new in optimization. Similar methods are known and used extensively in optimization like columngeneration, chunking and core-sets [10, 16, 42]. As far as we know, this is the first implementation thatsolves the k-clustering problem for small k (k > 1). One specific implementation that we should mentionis the O(n logn) algorithm implemented in CGAL [22] using searching of sorted matrices for rectangular2−4 centers but their method neither generalizes to higher dimensions nor can be easily extended to workwith other shapes. The same holds true for the implementation by Hoffmann [27]. The implementation byHoffmann also fixes k = 3. Since our implementation does not use the specific properties of a particular vari-ant of the general k-clustering problem, our method should not be used when the properties of the specificvariant can be exploited to code up fast implementations.

Another related implementation that has been recently posted on the web for solving the k-center prob-lem is by Jurij Mihelic [40]. This implementation solves the k-center problem on graphs by reducing it tothe dominating set problem. The implementation in its current state seems buggy and hence we chose not

3

Page 4: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

to compare it with our results in this paper. Personal communication with the author revealed that the codeis still being tested and not quite finished yet. Even for the smallest instance of k-center clustering (k = 2,n = 1002, data set = Padberg-Rinaldi, source = TSPLIB) that we solve in this paper, the implementation byJurij did not terminate on a 3.6Ghz P4 with 1GB RAM after 10 minutes. It also gave some wrong resultsfor the improved version of their algorithm and hence we chose not to compare timings with their imple-mentation in this paper. Another difference between that work and ours is that we are after a guaranteed(1+ ε)-approximation whereas they are after a 2-approximation (or a solution near to a 2-approximation).

Our Results: In this paper we present and implement an algorithm that computes k-clustering efficiently andalmost exactly for small k. Our algorithm is very general and can be adapted easily to compute k-clusteringsfor covering shapes like ellipses and rectangles. The assumptions on the covering shapes is mild. We alsoshow that the exponential running time that our implementation should encounter does not happen on reallife inputs (at least for small k in fixed dimensions). Our implementation is in C++, can easily be adapted touse new shapes which follow the assumptions we state and works in general dimensions.

2 The Algorithm

The algorithm that we experimented with is a variant of the algorithm presented in [9, 35] and is verysimilar to LP-type algorithms [23, 36]. The main difference in the algorithm that we implement comparedto [9, 35] is that it computes the solution almost exactly and gives up on the approximation guarantees givenby the previous algorithms. It also employs a pruning step to reduce the running time. Algorithm 1 showsthe algorithm that we implemented. It takes as input a set of points and outputs a (1 + ε)-approximate k-clustering. Here a (1 + ε)-approximate solution means that the output is a set of k shapes from Π whosemaximum measure is smaller than ρopt and whose expansion by (1+ ε) covers the input point set. A usefulbyproduct of the algorithm is a small core-set for the k-clustering problem.

Algorithm 1 Outputs a (1+ ε)-approximation of k-clustering of P and a small core-set.Require: Input: P⊂ Rd , M = {M1, . . . ,Mk}, B = {π1, . . . ,πk} ⊂Π, ε > 0, σ > 0.Require: Invariant: πi = Scaled MES(Mi) such that ρ(πi) = maxk

j=1 ρ(M j).1: loop2: p← point q ∈ P that is furthest violator of B .3: if p ∈ (1+ ε)B then4: Return M ,B5: else6: Sort B using distance from p (and hence M ).7: for j = 1 to k do8: If ρ({p}∪M j) > σ continue9: M ′ = M ; B ′ = B; M′j = M j ∪{p}; π′j = MES(M′j).

10: ∀ki=1 Scale π′i ∈ B ′ such that ρ(π′i) = ρ(B ′).

11: [M ′j ,B ′j]← Call k-cluster with M ′,B ′.

12: Ξ j← (1+ ε)ρ(B ′j).13: if Ξ j < σ then σ = Ξ j.14: end for15: Return M ′

J ,B ′J where J← argminkj=1Ξ j.

16: end if17: end loop

4

Page 5: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

There are a few assumptions needed to make Algorithm 1 work. We assume that an efficient algorithmis known for the 1-clustering problem for the shape s that is of interest. Our next assumption is that s shouldhave a notion of a center on which it can be scaled. For example, an ellipse could be scaled keeping its centerand rotation matrix fixed or a strip could be scaled keeping its center-line fixed. Given the union of a set ofcovering shapes B , a violator is a point q ∈ P and q /∈ B . The furthest violator p ∈ P of B is a point that isnot contained in B and p = argmaxq∈P\BnnB(q) where nnB(q) = minq′∈B ||q′q|| denotes the distance of thenearest neighbor of the violator q in B . For the purposes of this paper, we assume the distance is Euclidian.We assume, given P and B , an efficient oracle exists that can compute the furthest violator.

The implementation of this oracle is dependent on the shapes being used to define B and P. We assumethat P has at most n elements. When P is a set of points or polygons, its trivial to implement this oracle withB being the union of shapes like balls, strips, ellipses, rectangles or spherical shells. However, it is non-trivial to implement this oracle when both P and B consist of ellipses. The oracle was trivial to implementfor our implementation of the examples of the k-clustering given in this paper. The oracle is used in line 2of Algorithm 1.

The main idea behind our implementation is to mimic an exhaustive enumeration of the core-set to findthe best solution [9, 10, 35]. The algorithm is invoked by setting M to an empty set, σ← ∞ and ε to thedesired accuracy level. Each of the sets M = {M1, . . . ,Mk} is associated with a scaled minimum enclosingshape from Π. The set of scaled minimum enclosing shapes is described by B = {π1, . . . ,πk} ⊂ Π. Hereπi ∈ B is the scaled version of MES(Mi) such that ρ(πi) = maxk

j=1 ρ(M j). The minimum enclosing shapeof a null set is defined specially in the implementation to be a shape which does not contain any pointsand is infinitely far from any given point. The main loop of the algorithm (Steps 1-17), checks if the pointset has already been covered by a (1 + ε)-expansion of B , in which case the algorithm terminates (Step 4).Otherwise, there must be points outside the (1 + ε)-expansion of B . In this case, the furthest violator of Bis recursively put into each of the Mis and the computation of the k-clustering is done recursively. Note thatwe are now optimizing over the nodes of a tree whose leaves contain potential solutions of the k-clusteringproblem. The size of this tree in the worst case can be O(kn). The number of oracle calls (computationof nnB(q)) as well as calls to the 1-clustering computation can also be bounded by O(kn) . We prove thecorrectness of our algorithm in the next section.

In practice, the running time of our algorithm is much lower than O(kn) and hence lets us solvecomparatively large problems. It seems that this phenomenon is very similar to the simplex algorithm forwhich we do not have any polynomial time termination guarantees, although it works well in practice; forinstance Figure 1(a) shows an 8-clustering of 13510 cities of the United States (for ε = 10−5). One reasonthat the running time of our implementation does not seem dependent exponentially on n in practice isbecause of the fact that the core-set size seems to be dependent on k and the dimension but independent of nfor most real life inputs. This has also been observed in other implementations based on core-sets [35, 53].There are some optimizations that we implemented in the algorithm. σ is an upper bound that we maintainfor the solution to the k-clustering problem. If the addition of the furthest violator p of B makes ρ(B) > σ,then we do not recurse on that path in the tree. This helps immensely in pruning the number of nodes inthe tree (See Figure 5). In case more than one Mi is empty, the recursion only recurses in one of the emptyMi’s (this is not explicitly mentioned in Algorithm 1). This helps in starting the algorithm faster and pruningthe tree in the formation phase. Line 6 of the algorithm also sorts the πi’s with respect to their distances fromp (and the corresponding Mi’s). This is a greedy strategy that tries to first explore the branches of the treethat expand the maximum measure of the elements of B by least amount.

A similar algorithm to ours that can compute k-center clustering has a running time of O(21ε n) [9, 35].

The main idea of this algorithm is to explore the tree associated with the problem in a breadth first search

5

Page 6: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

fashion. The problem with this approach is that it takes exactly O(21ε n) time to run, which is impractical if

ε < 0.01 or when the core set size is large (> 15).

3 Correctness

In this section, we show the correctness of Algorithm 1. First we show that for ε = 0, and P is a set of points,Algorithm 1 outputs the correct answer. Initially, none of the points are assigned to any sets. We pick our firstpoint p and assign it to all sets one by one. For each of these assignments, we recurse into subproblems andfind the assignment of clusters for the remaining points that yields the best answer. Let p belong to clusterj′ in the optimal answer. If p was assigned to j′′ and that yielded a smaller measure than the assignment toj′, that would contradict the optimality of the solution in which p belongs to cluster j′.

Note that none of our pruning techniques can change the return of an optimal solution. For instance,Line 8 prunes a call in which p increments the measure more than the current best known measure. Even ifwe had evaluated that measure, it would be eliminated at some point since a clustering of better measure hasalready been found.

Another pruning that we employ is in the startup phase. The distance of a point to a minimum enclosingshape of a null set is set to infinity. The distance setting only has an effect on the order in which p is assignedto previously formed clusters. When there are more than one null sets, we only put a point p in one null set,instead of putting p in every null set, one by one. This is because there is no difference between the nullsets, and putting p in any null set will yield the same optimal solution.

The algorithm clearly defines a tree whose fan-out is at most k and whose height is bounded by at most n.Let us call this tree T . Let u be the node of the tree corresponding to the optimal solution. It exists in the treebecause there is a path from root to u such that every point added to the clustering along the path took thecorrect assignment of cluster to reach there. Now, lets consider an ε > 0, which controls the approximationfactor of the algorithm.

If the computation now stops at a node on the root to leaf path to u because an (1+ ε)-expansion of thesolution associated with the node covers the input, the current solution is smaller than the optimal solutionwhereas the (1+ε) expansion is an upper bound on the measure returned by the optimal solution. Hence thisparticular node, v on the root to u path is a (1+ ε)-approximate solution. The algorithm returns the absoluteminimum of all solutions explored and hence returns a solution that is smaller in measure than v and coversthe input after a (1+ ε)-expansion. This means that the solution returned must be (1+ ε)-approximate.

4 Implementation

The k-clustering implementation is carefully designed for easy extendibility and correctness. Our currentimplementation is in C++ . The implementation defines an abstract shape class which has the interfacegiven in Figure 4.

We implemented three different k-clustering implementations by deriving the shape class for specificshapes like disks, balls and strips. For implementing a particular shape class, the user just needs to derive theshape class and implement the pure virtual functions in the shape. The FLOAT type is redefined dependingupon how we want to calculate distances from the shape. The insert point function is executed at lines9-10 of Algorithm 1. Note that the user must maintain an instance of the minimum enclosing shape in thederived class and update it to reflect the insertion of points to the shape class. For instance, if the derivedclass is a disk, the user must maintain the minimum enclosing disk of the point list PL whenever a point isinserted in PL and update measure (which is equal to the radius of the minimum enclosing disk in this case).

6

Page 7: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

class Shape {protected:

PointList PL;FLOAT measure;

public:virtual void insert_point(Point p) = 0;virtual void draw(FLOAT max_measure) = 0;virtual FLOAT dist(Point p) = 0;virtual Shape *clone() = 0;

}

Shape Class Interface

Fig. 3. An abstract C++ class that needs to be implemented to do k-clustering using our implementation.

draw is used to draw the output either in LATEX or in OpenGL (http://www.opengl.org/) depending onthe required output. dist calculates the distance between a point and the shape. This function is repeatedlycalled at line 2 of Algorithm 1 to evaluate the furthest violator p. clone returns a copy of the shape and isused while recursing in Algorithm 1 at lines 9-11. Since all these functions are pure virtual functions, theuser is required to redefine them in the derived class. Once these four functions are defined and implementedcorrectly for a particular shape, the constructor of the class MinKShape calculates the k-clustering for thatparticular shape and the input point set using Algorithm 1. Our implementation for k-center clustering usesthe 1-center code available from Bernd Gartner’s web-site [24]. The derived shape contains an instance of theclass Miniball which makes it easy for us to implement the other functions needed for the class shape.Our implementation of the k-line-center clustering uses the min strip 2 class implemented in CGAL whichcan compute the minimum radius strip that contains a set of points. In both these cases, insertion of apoint solves the optimization problem from scratch. At the expense of more coding it might be possible tochange this to do incremental computation. Another important choice that we made was the number typefor min strip 2 class. We use GMP (http://www.swox.com/gmp/) rational number type to calculate theoptimal minimum width strip for a given set of points. We initially used exact arithmetic to implement ouralgorithm but the distance from strip to point computation becomes expensive when everything is done usingrational arithmetic (specially for large point sets). Hence we chose to implement the distance from point toline using doubles. The min strip 2 class still uses GMP rationals for its internal computations and hencecomputes the solution exactly. The class Miniball also computes the minimum enclosing ball of a setof points exactly (the basis set that defines the solution is exact). The only approximation we introduce isthe distance computation from the optimal ball to a given point. The errors in these computations is muchsmaller than the ε we used in our experiments.

5 Experimental Results

This section describes the empirical performance of our implementation when studied with disks, balls andstrips. The platform we chose to run our experiments on was a dual processor Xeon 3.0Ghz machine with1GB RAM on it. The code does not use threading, hence only one CPU was being used for the computation.The 2D datasets we chose to experiment on were taken from the TSPLIB library [43]. The 3D data sets werecollected from various sources. The table in Figure 4 shows a brief description of the data sets we used in

7

Page 8: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

our experiments. All results reported in the paper have been averaged over three runs. The timings reportedin the paper are in seconds and have been rounded to two decimal places.

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

6 5 4 3 2

Num

ber

of

nodes

eval

uat

ed

Number of circles

Before PruningAfter Pruning

Fig. 4. Effect of pruning on the number of nodes evaluated by the algorithm. The input is the German dataset from TSPLIB withε = 10−4

class is a circle, the user must maintain the minimum enclosing circle of the point list PL

whenever a point is inserted in PL and update measure (which is equal to the radius of theminimum enclosing circle in this case). draw is used to draw the output either in latex or inopengl depending on the required output. dist calculates the distance between a point andthe shape. This function is repeatedly called at line 4 of algorithm 1 to evaluate the furthestviolator p. clone returns a copy of the shape and is used while recursing in algorithm 1 atline 12. Since all these functions are pure virtual functions, the user is required to redefinethem in the derived class. Once these four functions are defined and implemented correctlyfor a particular shape, the constructor of the class MinKShape calculates the k-clustering forthat particular shape and the input point set using algorithm 1.

Our implementation for k-center clustering uses the 1-center code available from BerndGartner’s web-site [16]. The derived shape contains an instance of the class Miniball whichmakes it pretty easy for us to implement the other functions needed for the class shape. Ourimplementation of the k-line-center clustering uses the min strip 2 class implemented inCGAL which can compute the minimum radius strip that contains a set of points. In boththese cases, insertion of a point resolves the optimization problem. At the expense of morecoding it might be possible to change this to do incremental computation.

4 Experimental Results

This section describes the empirical performance of our implementation when studied withcircles, spheres and strips. The platform we chose to run our experiments on was a dualprocessor Xeon 3.0Ghz machine with 1GB RAM on it. The code does not use threading,hence only one CPU was being used for the computation. The 2D datasets we chose toexperiment on were taken from the TSPLIB library [33]. The 3D data sets were collectedfrom various sources. The table in figure 1(b) shows a brief description of the data sets weused in our experiments.

Table 1. Input data tabledim Name # of points2 Germany 185122 France 44612 USA 135092 Padberg-Rinaldi 10022 Reinelt Drilling 10603 Ball Joint 1370733 Bunny 81713 Cat 73403 Lyria 299703 Vase 68097

As mentioned before, we have experimentedwith three different covering shapes in this pa-per, namely circle, sphere and strips. We plot fourgraphs for each of the shapes. The first graph il-lustrates the time taken to do k-clustering as k in-creases. For this graph, we fixed ε = 0.001. The sec-ond graph shows the variation of ε vs time when k isfixed. The third graph shows the effect of changingε on the coreset size with fixed k again. The fourthgraph calculates a term called efficiency which isdefined as the ratio of kn vs the actual number ofnodes in the tree that were evaluated. This graphplots the log of efficiency vs number of balls andshows that we never are close to the worst case ofthe algorithm (For this experiment, we again fixedε = 0.001).

6

Fig. 5. Input data table indicating number of points for each data set used and the dimension the data lives in.

As mentioned before, we have experimented with three different covering shapes in this paper, namelydisk, ball and strips. We plot four graphs for each of the shapes. The first graph illustrates the time taken todo k-clustering as k increases. For this graph, we fixed ε = 0.001. The second graph shows the variation of ε

versus time when k is fixed. The third graph shows the effect of changing ε on the core-set size with fixed kagain. The fourth graph calculates a term called efficiency which is defined as the ratio of kn versus the actualnumber of nodes in the tree that were evaluated. This graph plots the log of efficiency versus number of ballsand shows that we never are close to the worst case of the algorithm (for this experiment, we again fixedε = 0.001). In the case when the covering shape is a disk, it seems that k-center clustering is practical for

8

Page 9: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

k≤ 4 (see Figure 6(a)). In Figure 6(b) the graph flattens out when ε decreases. Most of the time, this indicatesthat the solution is optimal and decreasing ε further has no effect on time. The same effect can be seen in thecore-set size which does not change below a certain value of ε. The efficiency plot (Figure 6(d)) shows thatthe number kn is huge compared to the nodes of the tree explored by our algorithm. Our experiments seem toshow that k-line-center clustering is a harder problem than the k-center problem (this is true at least for ouralgorithm). The timings for k-line-center clustering shown in Figure 7(a) are much higher than the timingsfor a similar experiment for both disks and balls. In Figure 7(b) the graph flattens out again as ε decreases.This phenomenon looks similar to the one observed with the k-center problem. The same happens withcore-set sizes. Note that the core-set sizes for k-line-center clustering are higher than k-center clustering.This is another reason why the time taken for k-line-center clustering is higher than k-center clustering. Thetime taken to do k-center clustering in three dimensions is higher than in two dimensions by a considerableamount (see Figure 8(a)). The core-set sizes also increase with the increase in dimension. This is expectedsince as the dimension increases, the set of points that define the solution (and hence the core-set size) forthe k-center clustering problem also increases. The experiments for k-center clustering in three dimensionsseem to show that k-center might be practical for applications requiring k ≤ 4 balls. The total running time

Input Input Π = {Circles} Π = {Slabs} Π = {Spheres}Type Size Nodes Core set Time Nodes Core set Time Nodes Core set Time

evaluated size taken evaluated size taken evaluated size taken

Lens 100 280 16 0 83230 23 17.12 2065 19 .011000 1020 26 .06 205715 44 67.00 3279 29 .1

10000 1415 31 .42 577490 59 683.67 18935 33 5.72Simplex 100 155 16 0 113520 27 24.3 2480 19 .01

1000 3845 18 0.11 981725 55 328.0 15495 27 .4810000 2820 33 .7 1269960 58 1389 34120 37 9.63

Gaussian 100 1920 19 .01 92555 25 19.27 5050 19 .031000 17650 28 .50 666165 48 207.67 22030 31 .68

10000 19480 28 4.95 1465490 57 1549.0 56110 26 16.29

Table 1. k-clustering with k = 5 and ε = 10−3 on synthetic data. The three different kinds of inputs used were: 1. points on thesurface of a lens (lens distribution with radius = 1000.0 generated using rbox, a tool distributed with qhull [7]). 2. points uniformlydistributed inside a random simplex. 3. points from a Gaussian distribution with mean zero and variance 1.

of our implementation depends mainly on two factors, the number of tree nodes evaluated and the size of thedata set. In fact at each node of the tree, we solve k instances of the 1-clustering problem and find the furthestpoint in the data set from the union of the 1-clustering solutions (might have to be expanded to make themequal in size). The total number of nodes evaluated in turn depends on the core-set size (which depends onthe shape of the input). The dependence of the input number of points on the total running time seems to belinear for most point sets (at least for the cases of k-clustering we implemented). Table 1 seems to confirmthis for various kinds of synthetic inputs. During the time we were doing our synthetic experiments we alsofound that the worst case running times we got were from points distributed on the surface of a unit sphere.Table 2 presents the results of our experimentation when the input came from such a distribution. It seemsthat points on the surface of the sphere generate worst case running times for our implementation becauseof the fact that there are many solutions that are near to the optimal and hence the algorithm can not prunethese nodes of the tree effectively. Notice that Table 1 has k = 5 which is a much more tougher clustering todo compared to the results of Table 2 where k = 2,3.

9

Page 10: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

Input Input Π = {Circles} Π = {Slabs} Π = {Spheres}Type Size Nodes Core set Time Nodes Core set Time Nodes Core set Time

evaluated size taken evaluated size taken evaluated size taken

Sphere 100 110 12 0 172 14 0.04 404 14 0k = 2 1000 320 14 .02 818 18 .34 6218 19 .24

10000 338 15 .12 7130 23 10.54 7672 22 2.95Sphere 100 288 12 0 966 18 .23 1848 16 .01

k = 3 1000 1251 17 .04 5709 25 2.46 195864 22 7.0010000 5187 19 1.66 20331 31 28.2 126782745 25 42310

Table 2. k-clustering with k = 2,3 and ε = 10−3 on points distributed on the surface of a sphere (generated using rbox) [7]).

6 Conclusions and Future Work

This paper, for the first time shows that k-clustering might be practical for simple shapes with small k, largen and very small ε. We believe that contrived examples can be constructed to make our implementation runin exponential time. There are many interesting open problems that this work leads to. On the theoreticalside, is there a way to explain the running time of our implementation? Is there a way to prune the treethat Algorithm 1 explores further? On the practical side, we would like to implement k-clustering for morecomplicated shapes like ellipses, ellipsoids and capsules [19]. It would be interesting to experiment withartificially generated data sets of different types, especially the ones that drive our implementation to ex-ponential running times. The implementation of the oracles needed by the algorithm for various types ofshapes also deserves future work. We plan to put our implementation on the web in the near future. Thek-clustering problem is also related to the set cover problem. For the kind of geometric setting we considerin this paper, the VC-dimension of the range space (P,Π) is fixed. We leave it as an open problem if tricksfrom iterative reweighting schemes and set cover approximation algorithms can be used to speed up ourimplementation [5, 8, 14, 51]. Points on a sphere seem to be the worst case example we found during ourexperiments (at least for the k-center problem). It seems that a random perturbation of the input could leadto a polynomial running time. We conjecture that the running time of the k-center algorithm that we presentcan be shown to be polynomial (or pseudo-polynomial) using smoothed analysis (in fixed dimensions) [48].Another interesting aspect of our implementation is that it can be easily adapted to kernel spaces if a 1-clustering solution is available in the kernel space. We hope to investigate this in the near future [11, 15].

7 Acknowledgments

The authors would like to thank Joe Mitchell and Alper Yildirim for discussions related to this paper. Weare grateful to the Associate Editor and anonymous referees for their valuable and constructive suggestions.We especially acknowledge an anonymous referee, whose comments prompted the addition of Section 3.

10

Page 11: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

0

20

40

60

80

100

120

7 6 5 4 3 2

Tim

e (s

ec)

Number of circles

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(a) Graph of number of centers versus time for the k-center clustering ofthe input. ε = 0.001.

0

20

40

60

80

100

120

1e-05 1e-04 0.001 0.01 0.1

Tim

e (s

ec)

epsilon

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(b) Graph of time taken by the implementation versus variation in ε. k isfixed to be 7.

20

25

30

35

40

45

50

55

1e-05 1e-04 0.001 0.01 0.1

Cor

e-se

t siz

e

epsilon

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(c) Graph showing how core-set size varies with ε. k is fixed to be 7.

0

10

20

30

40

50

60

70

80

2 3 4 5 6 7

log(

effi

cien

cy)

Number of circles

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(d) Graph showing log of efficiency versus number of centers for k-centerclustering in the plane. ε = 0.001.

Fig. 6. Graphs of various experiments on k-clustering when the covering shape is a disk.

11

Page 12: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

0

500

1000

1500

2000

2500

3000

3500

4000

5 4 3 2

Tim

e (s

ec)

Number of strips

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(a) Graph of number of strips versus time for the k-line-center clusteringof the input. ε = 0.001.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1e-05 1e-04 0.001 0.01 0.1

Tim

e (s

ec)

epsilon

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(b) Graph of time taken by the implementation versus variation in ε. k isfixed to be 5.

35

40

45

50

55

60

65

70

75

80

85

1e-05 1e-04 0.001 0.01 0.1

Core

-set

siz

e

epsilon

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(c) Graph showing how core-set size varies with ε. k is fixed to be 5.

0

20

40

60

80

100

120

2 2.5 3 3.5 4 4.5 5

log(

effi

cien

cy)

Number of strips

GermanyFrance

Padberg-RinaldiReinelt-drilling

USA

(d) Graph showing log of efficiency versus number of strips for k-line-center clustering in the plane. k = 5 and ε = 0.001.

Fig. 7. Graphs of various experiments on k-line-center clustering when the covering shape is a strip.

12

Page 13: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

0

500

1000

1500

2000

2500

5 4 3 2

Tim

e (s

ec)

Number of Spheres

BallJointBunny

CatLyriaVase

(a) Graph of number of centers versus time for the k-clustering of theinput. ε = 0.001.

0

500

1000

1500

2000

2500

1e-05 1e-04 0.001 0.01 0.1

Tim

e (s

ec)

epsilon

BallJointBunny

CatLyriaVase

(b) Graph of time taken by the implementation versus variation in ε. k isfixed to be 5.

10

15

20

25

30

35

40

45

50

1e-05 1e-04 0.001 0.01 0.1

Cor

e-se

t siz

e

epsilon

BallJointBunny

CatLyriaVase

(c) Graph showing how core-set size varies with ε. k is fixed to be 5.

0

10

20

30

40

50

60

70

2 2.5 3 3.5 4 4.5 5

log(

effi

cien

cy)

Number of Spheres

BallJointBunny

CatLyriaVase

(d) Graph showing log of efficiency versus number of centers for k-centerclustering in the plane. k = 5 and ε = 0.001.

Fig. 8. Graphs of various experiments on k-clustering when the covering shape is a ball.

13

Page 14: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

Bibliography

[1] Pankaj K. Agarwal and Cecilia M. Procopiuc. Approximation algorithms for projective clustering. J. Algorithms, 46(2):115–139, 2003. ISSN 0196-6774. doi: http://dx.doi.org/10.1016/S0196-6774(02)00295-X.

[2] Pankaj K. Agarwal and Cecilia M. Procopiuc. Exact and approximation algorithms for clustering. In Proc. 9th ACM-SIAMSympos. Discrete Algorithms, pages 658–667, 1998.

[3] Pankaj K. Agarwal and Cecilia Magdalena Procopiuc. Approximation algorithms for projective clustering. In Symposium onDiscrete Algorithms, pages 538–547, 2000. URL citeseer.ist.psu.edu/agarwal00approximation.html.

[4] Pankaj K. Agarwal and Micha Sharir. Efficient algorithms for geometric optimization. ACM Computing Surveys, 30(4):412–458, 1998. URL citeseer.ist.psu.edu/article/agarwal98efficient.html.

[5] Pankaj K. Agarwal, Cecilia Magdalena Procopiuc, and Kasturi R. Varadarajan. Approximation algorithms for k-line center.In ESA ’02: Proceedings of the 10th Annual European Symposium on Algorithms, pages 54–63, London, UK, 2002. Springer-Verlag. ISBN 3-540-44180-8.

[6] E. Assa and M. J. Katz. 3-piercing of d-dimensional boxes and homothetic triangles. Internat. J. Comput. Geom. Appl., 9(3):249–259, 1999.

[7] C. Bradford Barber, David P. Dobkin, and Hannu Huhdanpaa. The Quickhull algorithm for convex hulls. ACM Trans. Math.Softw., 22(4):469–483, December 1996.

[8] Herve Bronnimann and Michael T. Goodrich. Almost optimal set covers in finite vc-dimension: (preliminary version). InSCG ’94: Proceedings of the tenth annual symposium on Computational geometry, pages 293–302. ACM Press, 1994. ISBN0-89791-648-4. doi: http://doi.acm.org/10.1145/177424.178029.

[9] M. Badoiu and K. L. Clarkson. Smaller core-sets for balls. In Proceedings of 14th Annual ACM-SIAM Symposium on DiscreteAlgorithms, pages 801–802, 2003.

[10] M. Badoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. Proc. 34th Annu. ACM Sympos. on Theory ofComputing, pages 250–257, 2002.

[11] Y. Bulatov, S. Jambawalikar, P. Kumar, and S. Sethia. Hand recognition using geometric classifiers. In Proceedings ofInternational Conference on Biometric Authentication, volume 3072 of Lecture Notes Comput. Sci., pages 753–759. Springer-Verlag, 2004.

[12] Timothy M. Chan. Faster core-set constructions and data stream algorithms in fixed dimensions. In SCG ’04: Proceedings ofthe twentieth annual symposium on Computational geometry, pages 152–159. ACM Press, 2004. ISBN 1-58113-885-7. doi:http://doi.acm.org/10.1145/997817.997843.

[13] S. W. Cheng, S. Funke, M. Golin, P. Kumar, S. Poon, and E. Ramos. Curve reconstruction from noisy samples. In Proc. 19thAnnual ACM Symposium on Computational Geometry, pages 302–311, 2003.

[14] Kenneth L. Clarkson. Las vegas algorithms for linear and integer programming when the dimension is small. J. ACM, 42(2):488–499, 1995. ISSN 0004-5411. doi: http://doi.acm.org/10.1145/201019.201036.

[15] Nello Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines and other kernel-based learningmethods. Cambridge University Press, 2000. URL http://www.support-vector.net.

[16] G. B. Dantzig and P. Wolfe. Decomposition principle for linear programs. Operations Research, 8:101–111, 1960.

[17] Z. Drezner. The p-center problem. European J. Oper. Res., 26(2):312–313, 1986.

[18] M E Dyer. On a multidimensional search technique and its application to the euclidean one centre problem. SIAM J. Comput.,15(3):725–738, 1986. ISSN 0097-5397.

[19] Dave Eberly. 3D Game Engine Design. Morgan Kaufmann, 2001.

[20] D. Eppstein. Faster construction of planar two-centers. In Proc. 8th ACM-SIAM Sympos. Discrete Algorithms, 1997.

[21] T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In Proc. 20th Ann. ACM Symp. on Theory ofComp., pages 434–444. ACM, 1988.

[22] Kaspar Fischer, Bernd Grtner, Thomas Herrmann, Michael Hoffmann, Eli Packer, and Sven Schnherr. Geometric optimiza-tion. In CGAL Editorial Board, editor, CGAL User and Reference Manual. 3.3 edition, 2007.

Page 15: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

[23] Bernd Gartner. A subexponential algorithm for abstract optimization problems. SIAM J. Comput., 24:1018–1035, 1995.

[24] Bernd Gartner and Svend Schonherr. An efficient, exact, and generic quadratic programming solver for geometric optimiza-tion. In Proc. 16th Annu. ACM Sympos. Comput. Geom., pages 110–118, 2000.

[25] T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Comput. Sci., 38:293–306, 1985.

[26] Dan Halperin, Micha Sharir, and Ken Goldberg. The 2-center problem with obstacles. In Proc. 16th Annu. ACM Sympos.Comput. Geom., pages 80–90, 2000.

[27] Michael Hoffmann. A simple linear algorithm for computing rectangular three-centers. In Proc. 11th Canadian Conferenceon Computational Geometry (CCCG ’99), Vancouver(BC), Canada, pages 72–75, 1999.

[28] Philip M. Hubbard. Approximating polyhedra with spheres for time-critical collision detection. ACM Trans. Graph., 15(3):179–210, 1996. ISSN 0730-0301. doi: http://doi.acm.org/10.1145/231731.231732.

[29] R. Z. Hwang, R. C. Chang, and R. C. T. Lee. The searching over separators strategy to solve some NP-hard problems insubexponential time. Algorithmica, 9:398–423, 1993.

[30] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, 1999.

[31] M. J. Katz and Micha Sharir. An expander-based approach to geometric optimization. SIAM J. Comput., 26:1384–1408,1997.

[32] J. T. Klosowski, M. Held, J. S. B. Mitchell, H. Sowizral, and K. Zikan. Efficient collision detection using bounding vol-ume hierarchies of k-DOPs. IEEE Transactions on Visualization and Computer Graphics, 4(1):21–36, /1998. URL cite-seer.ist.psu.edu/klosowski96efficient.html.

[33] M. T. Ko and Y. T. Ching. A linear time algorithm for the weighted rectilinear 2-center problem. In Proceedings of theInternational Workshop on Discrete Algorithms and Complexity, pages 75–82, Fukuoka, Japan, 1989. Institute of Electronics,Information and Communication Engineers (IEICE), Tokyo.

[34] Bernd Kreuter and Till Nierhoff. Greedily approximating the r-independent set and k-center problems on random instances.In Proceedings of the first international workshop on Randomization and Approximation Techniques in Computer Science(RANDOM 97), J. Rolim (ed.), volume 1269 of LNCS, pages 43–53. Springer, 1997. http://www.informatik.hu-berlin.de/ nier-hoff/pubs/greedy.html.

[35] P. Kumar, J. S. B. Mitchell, and A. Yıldırım. Approximate minimum enclosing balls in high dimensions using core-sets. TheACM Journal of Experimental Algorithmics, 8, 2003. URL http://www.compgeom.com/meb/.

[36] J. Matousek, Micha Sharir, and Emo Welzl. A subexponential bound for linear programming. In Proc. 8th Annu. ACMSympos. Comput. Geom., pages 1–8, 1992.

[37] N. Megiddo and K.J. Supowit. The complexity of some common geometric location problems. SIAM J. on Computing, 13(1):182–196, February 1984.

[38] N. Megiddo and A. Tamir. On the complexity of locating linear facilities in the plane. Operations Research Letters, 13:194–197, 1982.

[39] Amit Mhatre and Piyush Kumar. Projective clustering and its application to surface reconstruction: extended abstract.In SCG ’06: Proceedings of the twenty-second annual symposium on Computational geometry, pages 477–478, NewYork, NY, USA, 2006. ACM Press. ISBN 1-59593-340-9. doi: http://doi.acm.org/10.1145/1137856.1137927. URLhttp://compgeom.com/meb/socg qt.mov.

[40] Jurij Mihelic and Borut Robic. Approximation algorithms for k-center problem: an experimental evaluation. In Proc. OR2002, Klagenfurt, Austria, 2002.

[41] J. Pitt-Francis and R. Featherstone. Automatic generation of sphere hierarchies from cad data. In Proc. International Conf.on Robotics and Automation, pages 324–329, 1998. http://users.comlab.ox.ac.uk/joe.pitt-francis/icra.ps.gz.

[42] J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines, 1998. URL cite-seer.ist.psu.edu/platt98sequential.html.

[43] G. Reinelt. TSPLIB — a traveling salesman problem library. ORSA Journal on Computing, 3:376 – 384, 1991.

[44] J. Robert and G. Toussaint. Computational geometry and facility location. Technical Report SOCS 90.20, McGill Univ.,Montreal, PQ, 1990.

[45] S. Rusinkiewicz and M. Levoy. QSplat: A Multiresolution Point Rendering System for Large Meshes. In Computer GraphicsProceedings, Annual Conference Series (SIGGRAPH 2000), August 2000.

15

Page 16: Almost optimal solutions to -clustering problemscompgeom.com/~piyush/papers/kcenter.pdf · Almost optimal solutions to k-clustering problems? Pankaj Kumar1 and Piyush Kumar2 1 Purdue

[46] Micha Sharir. A near-linear algorithm for the planar 2-center problem. In Proc. 12th Annu. ACM Sympos. Comput. Geom.,pages 106–112, 1996.

[47] Micha Sharir and Emo Welzl. Rectilinear and polygonal p-piercing and p-center problems. In Proc. 12th Annu. ACM Sympos.Comput. Geom., pages 122–132, 1996.

[48] Daniel Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomialtime. In Proceedings of the thirty-third annual ACM Symposium on Theory of Computing,, pages 296–305, 2001.

[49] G. T. Toussaint. Solving geometric problems with the rotating calipers. In Proc. IEEE MELECON ’83, pages A10.02/1–4,1983.

[50] Vijay Vazirani. Approximation Algorithms. Springer-Verlag, 2001.

[51] Emo Welzl. Partition trees for triangle counting and other range searching problems. In Proc. 4th Annu. ACM Sympos.Comput. Geom., pages 23–33, 1988.

[52] B. Xiao, Q. Zhuge, Y. He, Z. Shao, and E. Sha. Algorithms for disk covering problems with the most points. In Proc. IASTEDInternational Conference on Parallel and Distributed Computing and Systems (IASTED PDCS), pp. 541-546, pages 541–546,2003.

[53] Hai Yu, Pankaj K. Agarwal, Raghunath Poreddy, and Kasturi R. Varadarajan. Practical methods for shape fitting and kineticdata structures using core sets. In SCG ’04: Proceedings of the twentieth annual symposium on Computational geometry,pages 263–272. ACM Press, 2004. ISBN 1-58113-885-7. doi: http://doi.acm.org/10.1145/997817.997858.

16