protecting individual interests across clusters: spectral

Protecting Individual Interests across Clusters:Spectral Clustering with Guarantees

Shubham Gupta and Ambedkar DukkipatiDepartment of Computer Science and Automation

Indian Institute of Science, Bangalore - 560012, INDIA.[shubhamg, ambedkar]@iisc.ac.in

Abstract

Studies related to fairness in machine learning haverecently gained traction due to its ever-expandingrole in high-stakes decision making. For example,it may be desirable to ensure that all clusters dis-covered by an algorithm have high gender diversity.Previously, these problems have been studied undera setting where sensitive attributes, with respect towhich fairness conditions impose diversity across clus-ters, are assumed to be observable; hence, protectedgroups are readily available. Most often, this maynot be true, and diversity or individual interests canmanifest as an intrinsic or latent feature of a socialnetwork. For example, depending on latent sensitiveattributes, individuals interact with each other andrepresent each other’s interests, resulting in a net-work, which we refer to as a representation graph.Motivated by this, we propose an individual fairnesscriterion for clustering a graph G that requires eachcluster to contain an adequate number of membersconnected to the individual under a representationgraph R. We devise a spectral clustering algorithmto find fair clusters under a given representationgraph. We further propose a variant of the stochas-tic block model and establish our algorithm’s weakconsistency under this model. Finally, we presentexperimental results to corroborate our theoreticalfindings.

1 Introduction

Machine learning has penetrated several applicationdomains and now exhibits a significant influence onsociety. Notable examples include content recommen-dations, targeted advertising, ranking search results,and assessment of credit risks, to name a few. As

stakes rise, it is essential to ensure that machinelearning algorithms are fair, reliable, and transpar-ent. For example, a service clustering news articlesby topics must provide a diversity of opinions withineach cluster to counter polarization. Towards thisend, several recent approaches have presented fairvariants of commonly used supervised (Dwork et al.,2012; Hardt et al., 2016; Caton & Haas, 2020) and un-supervised machine learning algorithms (Chierichettiet al., 2017; Celis et al., 2018a,b; Samadi et al., 2018).

Of particular interest in the context of this paperare the fairness notions for clustering (Chierichettiet al., 2017; Bercea et al., 2019; Bera et al., 2019;Kleindessner et al., 2019a; Jung et al., 2020; Ma-habadi & Vakilian, 2020). Intuitively, the objec-tive of clustering seems to be at odds with fairness(Chierichetti et al., 2017). If the data has an in-herent bias, a clustering algorithm will benefit fromexploiting it. However, the solution favored by thealgorithm may not be ethical, as pointed in the exam-ple above. Fortunately, several recent studies haveshown that it is often possible to find fair clusterswhile only paying a relatively small price for the clus-tering algorithm’s increased objective function value(Kleindessner et al., 2019b; Bercea et al., 2019).

Several fairness notions have been recently pro-posed to quantify the fairness of clusters. Chierichettiet al. (2017) use the disparate impact doctrine tosuggest that all protected groups (such as gender orrace) must have approximately equal representationin all clusters. Others have followed up with theidea of proportional representation in various forms(Ahmadian et al., 2019; Backurs et al., 2019; Beraet al., 2019; Kleindessner et al., 2019b; Ahmadianet al., 2020). While these approaches assume thatsensitive attributes under consideration are binaryor categorical, others have worked with real-valued

arX

iv:2

105.

0371

4v1

[cs

.LG

] 8

May

202

1

attributes (Abraham et al., 2020).All approaches listed above assume that the algo-

rithm can directly observe the sensitive attributesthat provide a notion of diversity, hence assumingthat protected groups are readily available. On theother hand, in the real world, diversity can be anintrinsic or latent feature of individuals connectedby a social network. Further, in most cases, fairnessis dealt with at the level of population. However,it is also important to consider a fairness notion atthe individual level. Motivated by this, we addressthe following questions. How can we define fairnesswhen the sensitive attributes are not observable, butprotected groups underlie in a representation graph?Can the spectral graph methods solve this problem,especially with individual fairness constraints? If so,is the resultant spectral algorithm consistent?Central to our fairness definition is the notion of

a representation graph. This graph contains a nodefor each individual in the dataset, and two nodesare connected if they are similar with respect to sen-sitive attributes or represent each others’ interests.We define our fairness notion from each individual’sviewpoint and require the individual’s neighbors inthe representation graph to be approximately pro-portionally represented in all clusters. In Section 3,we show that this notion generalizes the case of fullyobserved sensitive attributes and is compatible withboth discrete and real-valued attributes.Most common fairness notions enforce statistical

fairness that applies at the population level but maynot consider individuals’ interests. For example,the proportional representation of protected groups(Chierichetti et al., 2017) implicitly assumes thatevery pair of members from a protected group canrepresent each others’ interests. On the other hand,the representation graph allows us to define fairnessfrom the individuals’ perspective while also beingequivalent to statistical fairness in a specific setting,as shown in Section 3. While approaches that con-sider individual fairness for clustering do exist, theydo not use sensitive attributes but instead, rely on ex-amples being close to cluster centroids (Mahabadi &Vakilian, 2020; Jung et al., 2020; Chen et al., 2019).

We propose a spectral clustering algorithm to par-tition or cluster a graph G, where clusters are fairfor each individual under a representation graphR. Following Kleindessner et al. (2019b), we addthe fairness objective as a constraint to the spectralclustering optimization problem. However, as noted

before, our fairness notion is more expressive thanthe one used by Kleindessner et al. (2019b). Wealso propose a variant of the well-known stochasticblock model (Holland et al., 1983) with respect tothe representation graph R and establish that ouralgorithm recovers fair clusters with high probabilityunder this model (weak consistency). Note that spec-tral clustering solves the relaxation of an NP-hardoptimization problem (von Luxburg, 2007). Thus, asin the case of Kleindessner et al. (2019b), we cannotguarantee in general that the clusters returned bythe algorithm will be fair. However, our experimentsshow that incorporating the fairness constraint doesyield relatively more fair clusters from an individualperspective in practice.Our main contributions are, (i) the notion of in-

dividual fairness with respect to a representationgraph that does not require observable sensitive at-tributes and generalizes statistical fairness; (ii) Aspectral clustering algorithm that tries to find clus-ters that are fair from individuals’ perspective; (iii)Theoretical guarantees on the performance of ouralgorithm under the proposed variant of SBM; and(iv) Empirical validation of our algorithm.

2 Notation and Preliminaries

Let V = {v1, v2, . . . , vN} be a set of nodes (e.g., setof individuals) connected by a graph (or network) G.Our aim is to cluster or partition G while satisfyingthe fairness criterion. Let R be a graph defined overV that represents how similar the individuals arewith respect to some sensitive attributes. Anotherinterpretation of R can be that nodes are connectedif they represent each others’ interests. We refer toR as the Sensitive Attribute Representation graphof V or simply representation graph. We use A andR ∈ {0, 1}N×N to denote the adjacency matrices ofgraphs G and R, respectively. We assume that Gand R are undirected, and G additionally has no self-loops. Thus, A and R are symmetric and Aii = 0for all i ∈ [N ], where [n] := {1, 2, . . . , n} for anyinteger n. Such graphs are often naturally availablein many practical scenarios or can be constructedfrom observed features via kernels. For example, Gcan be a graph constructed based on non-sensitiveattributes, and R can be a graph constructed basedon sensitive attributes. We often use the terms nodes,vertices, and individuals interchangeably.

The aim is to find K ≥ 2 clusters C1, C2, . . . , CK ⊆

V in the graph G such that ∪Ki=1Ci = V and Ci∩Cj =Φ if i 6= j. We further require these clusters to beapproximately fair under the representation graphR using the fairness criterion defined in Section 3.Towards this end, we modify the spectral clusteringalgorithm in Section 4 to include the fairness crite-rion as a constraint. In this section, we recall theoptimization problem solved by spectral clustering.

The ratio-cut objective given by

RatioCut(C1, . . . , CK) =K∑i=1

Cut(Ci,V\Ci)|Ci|

,

provides a measure of the goodness of clustersC1, . . . , CK (von Luxburg, 2007). Here, Cut(A,B) =12

∑vi∈A,vj∈B Aij counts the edges between subsets

A,B ⊆ V. A good clustering has a low ratio-cutvalue.

Define H ∈ RN×K as

Hij =

1√|Cj |

if vi ∈ Cj

0 otherwise.(1)

Further, let L = D −A denote the Laplacian of thesimilarity graph G, where D ∈ RN×N is a diago-nal matrix and Dii =

∑Nj=1Aij for all i ∈ [N ]. It

can be easily verified that RatioCut(C1, . . . , CK) =trace{HᵀLH} (von Luxburg, 2007). Thus, to findgood clusters, one can minimize trace{HᵀLH} whereH is of the form (1).

However, this is computationally hard due to thecombinatorial nature of the constraint requiring Hto be of the form (1). Spectral clustering insteadsolves the following relaxed optimization problem:

minH∈RN×K

trace{HᵀLH} s.t. HᵀH = I. (2)

Note that if H is of the form (1), then indeed HᵀH =I.

3 Fairness Criterion

Consider the topic of a “good” economy, where indi-vidual opinions lie on a broad spectrum. Suppose thegoal is to partition the population into clusters thatcan frame their separate economic policies. However,as members can trade across clusters, one cluster’spolicies will likely also affect others’ prosperity. Thus,every individual wishes to have other like-mindedpeople representing their viewpoint in other clusters.

But, being like-minded is often a complex function ofseveral factors. For example, individuals having thesame perspective on the economy may not alwaystrust each other due to, for instance, personality dif-ferences. A representation graph specifies pairs ofindividuals who believe that they can represent eachother’s viewpoint in different clusters.The proposed fairness criterion is defined as fol-

lows.

Definition 3.1 (Fairness Criterion). Given a rep-resentation graph R, a vertex vi finds clustersC1, . . . , CK of G fair, if all its neighbours in R areproportionately represented in each cluster. That is

|{j : Rij = 1 ∧ vj ∈ Ck}||Ck|

=|{j : Rij = 1}|

N, ∀k ∈ [K].

(3)Clusters C1, . . . , CK are said to be fair with respect toa representation graph R, if all vertices in V find theclusters fair.

The next lemma specifies a condition on the matrixH (defined in (1)), which ensures that the correspond-ing clusters satisfy the fairness criterion given above.All proofs have been deferred to the supplementarymaterial.

Lemma 3.2. Let H ∈ RN×K have the form specifiedin (1). The condition

R(I − 1

N11ᵀ)H = 0N×K (4)

implies that the corresponding clusters C1, . . . , CK arefair under the representation graph R. Here, I isthe N ×N identity matrix and 1 is a N dimensionalall-ones vector.

The next few remarks highlight the importantproperties of our fairness notion.

Statistical fairness as a special case: Severalapproaches assign each vertex to one of the P pro-tected groups (Chierichetti et al., 2017; Kleindessneret al., 2019b) and require these protected groupsto have proportional representation in all clusters,i.e., |Pi∩Cj |

|Cj | = |Pi|N , ∀i ∈ [P ], j ∈ [K], where Pi ⊆ V

is the ith protected group. This is an example ofstatistical fairness. A representation graph, whereRij = 1 if and only if vi and vj belong to the sameprotected group reduces the fairness criterion in Def-inition 3.1 to the statistical fairness criterion given

above. Thus, statistical fairness is a special case ofour fairness notion. In particular, we strictly gener-alize the approach presented in Kleindessner et al.(2019b). Also noteworthy is the assumption made bystatistical fairness, namely that every pair of verticesin a protected group can represent each others’ inter-ests, or they are very similar with respect to somesensitive attributes (notice the choice of R above).This assumption becomes unreasonable as protectedgroups grow in size.

Sensitive attributes and Protected groups:The proposed fairness criterion only requires a rep-resentation graph R. It has two advantages overthe existing fairness criteria: (i) it does not requireobservable sensitive attributes, (ii) even if sensitiveattributes are provided, one need not specify the num-ber of protected groups or explicitly compute them.For example, individuals in R may be connected iftheir age difference is less than five years and if theywent to the same school. Crucially, the sensitiveattributes used to construct R may be numerical,binary, categorical, etc.

Realizability: Clearly, not all representationgraphs admit a fair solution. In particular, be-cause the columns of H ∈ RN×K in (1) are or-thogonal, (4) cannot be satisfied if the dimensionof null(R(I − 11ᵀ/N)) is less than K. Intuitively, asrank{R(I−11ᵀ/N)} increases, it becomes harder tosatisfy (4). In particular, any clustering is fair whenR = 11ᵀ (i.e., every individual can represent everyother individual) and hence rank{R(I − 11ᵀ)} = 0.We impose additional constraints on R in our con-sistency analysis in Section 5 to ensure that a fairclustering always exists. While those constraintsare sufficient, they are not necessary, and hence ourfairness notion is applicable more generally.

In the next section, we modify the spectral cluster-ing algorithm to discover clusters that approximatelysatisfy our fairness criterion.

4 Algorithm

Recall from Section 2 that spectral clustering approx-imately minimizes the ratio-cut objective by relaxingan NP-hard optimization problem to the one given in(2). Lemma 3.2 provides a sufficient condition that amatrix H of the form (1) must satisfy to ensure that

the corresponding clusters are fair. Ideally, we wouldlike to solve the following optimization problem toget fair clusters,

minH

trace{HᵀLH}

s.t. H is of the form (1) ;R(I − 1

N11ᵀ)H = 0N×K .

However, as noted before, the constraint on H makesthis problem NP-hard. Following Kleindessner et al.(2019b), we instead solve the following relaxed prob-lem,

minH

trace{HᵀLH}

s.t. HᵀH = I; R(I − 1

N11ᵀ)H = 0N×K .(5)

Clearly, the columns of H must belong to the nullspace of R(I − 11ᵀ/N). Thus, any feasible Hcan be expressed as H = Y Z for some matrixZ ∈ RN−r×K , where Y ∈ RN×N−r is an orthonor-mal matrix containing the basis vectors of the nullspace of R(I −11ᵀ/N) as its columns. Here, r is therank of R(I − 11ᵀ/N). Because Y ᵀY = I, HᵀH =ZᵀY ᵀY Z = ZᵀZ. Thus, HᵀH = I ⇔ ZᵀZ = I.The following optimization problem is equivalent to(5) by setting H = Y Z.

minZ

trace{ZᵀY ᵀLY Z}

s.t. ZᵀZ = I. (6)

As in standard spectral clustering (von Luxburg,2007), the solution to (6) is given by the K leadingeigenvectors of Y ᵀLY . Of course, N − r must beat least K for K eigenvectors to exist as Y ᵀLY hasdimensions N − r × N − r. The clusters can thenbe recovered by using k-means clustering to clusterthe rows of H = Y Z1. Algorithm 1 summarizes thisprocedure, which we refer to as Gen-FairSC.

Relation to Kleindessner et al. (2019b):Kleindessner et al. (2019b) also follow the samestrategy and introduce their fairness criterion as anequality constraint in the spectral clustering’s opti-mization problem. As noted in Section 3, a particularconfiguration of R reduces our fairness criterion to

1Standard spectral clustering also follows a similar proce-dure. It first finds a matrix U ∈ RN×K whose columns arethe first K eigenvectors of laplacian L. Then, it uses k-meanson the rows of U to obtain the clusters.

Algorithm 1 Gen-FairSC1: Input: Adjacency matrix A, number of clustersK, representation matrix R

2: Output: Clusters C1, C2, . . . , CK3: Compute Y containing orthonormal basis vectors

of the null space of R(I − 1N 11ᵀ)

4: Compute Laplacian L = D −A5: Compute leading K eigenvectors of Y ᵀLY . LetZ contain these vectors as its columns.

6: Apply k-means clustering to rows of H = Y Z

theirs. Thus, their algorithm is a special case of Gen-FairSC, and hence our algorithm’s name is justifiedas it is also compatible with more general representa-tion graphs for which rank{R(I−11ᵀ/N)} ≤ N−K.

Fairness of the clusters: Note that the con-straint R(I − 11ᵀ/N)H = 0 implies the fairnessof clusters only when H has the form given in (1).Thus, a feasible solution to the relaxed optimiza-tion problem in (5) may not necessarily result in fairclusters. In fact, as noted in (Kleindessner et al.,2019b), there are no general guarantees that boundthe difference between the optimal solution of (2)and the optimal solution of the original NP-hardratio-cut problem. Hence, we cannot guarantee thefairness of clusters discovered by solving (6) insteadof solving (5) in the general case. Nonetheless, weshow in Section 5 that the discovered clusters areindeed fair under certain assumptions.

Computational complexity: Algorithm 1 has atime complexity of O(N3) and space complexity ofO(N2). Finding the null space in Line 3 and comput-ing the eigenvectors in Line 5 are the computation-ally dominant steps. This matches the worst-casecomplexity of both the standard spectral clusteringalgorithm and the algorithm given in Kleindessneret al. (2019b). For small K, several approximationscan reduce complexity, but most such techniqueswork only when K = 2 (Yu & Shi, 2004; Xu et al.,2009).

5 Theoretical Guarantees

In this section, we show that Algorithm 1 recoversthe ground truth clusters with a high probabilityunder mild assumptions on the representation graph

for networks sampled from a modified variant of theStochastic Block Model (SBM) (Holland et al., 1983).The ground truth clusters are fair by construction asdescribed in Section 5.2.

5.1 R-SBM

The well known Stochastic Block Model (SBM)(Holland et al., 1983) takes, as input, a functionπ : [N ]→ [K] that assigns each vertex vi ∈ V to oneof the K clusters. Then, independently, for all nodepairs (vi, vj) such that i > j,

P(Aij = 1) = Bπ(i)π(j),

where B ∈ [0, 1]K×K is a symmetric matrix. Theentry Bk` specifies the probability of a connectionbetween two nodes that belong to clusters Ck andC` respectively. A commonly used variant of SBMassumes Bkk = α and Bk` = β for all k, ` ∈ [K]such that k 6= `. We define a SBM with respect to arepresentation graph R as follows.

Definition 5.1 (R-SBM). A R-SBM is defined bythe tuple (π,R, p, q, r, s), where π : [N ]→ [K] mapsvertices in V to clusters, R is a representation graph,and 1 ≥ p ≥ q ≥ r ≥ s ≥ 0 are probabilities used forsampling edges. Under this model,

P(Aij = 1) =

p if π(i) = π(j) and Rij = 1,

q if π(i) 6= π(j) and Rij = 1,

r if π(i) = π(j) and Rij = 0,

s if π(i) 6= π(j) and Rij = 0.

(7)

When vertices vi and vj belong to the same cluster,they have a higher probability of connection betweenthem for a fixed value of Rij (p > q and r > s).However, the vertices also have a higher tendencyto connect if they are connected in R, even if theydo not belong to the same cluster (q > r). WhenR itself has a community structure, there are twonatural ways to cluster the vertices: (i) based on theground-truth clusters C1, C2, . . . , CK specified by π;and (ii) based on the clusters in R.

The clusters based on communities in R are likelyto not satisfy the fairness criterion in Definition 3.1.To see this, note that tightly connected vertices inR will be assigned to the same cluster in this caserather than being spread evenly across clusters asdemanded by the fairness criterion. On the other

hand, the ground-truth clusters can be constructedto be fair, as we demonstrate in Section 5.2 afterenforcing more constraints on R. Assuming that theground-truth clusters are fair, the goal is to showthat Algorithm 1 recovers the ground-truth clusterswith high probability rather than returning any othernatural but unfair clusters.

5.2 Consistency Result

As noted in Section 3, some representation graphslead to fairness constraints that cannot be satisfied.For example, consider an R, where a vertex vi hasonly two neighbors. It is not possible for this ver-tex to have a representative in all clusters if K > 2.Therefore, some additional assumptions are requiredon R to ensure that a fair clustering is indeed possi-ble.

Assumption 1. rank{R} ≤ N −K.

Note that I − 11ᵀ/N is a projection matrix and1 is its eigenvector with eigenvalue 0. Any vectororthogonal to 1 is also an eigenvector with eigen-value 1. Thus, rank{I − 11ᵀ/N} = N − 1. Becauserank{R(I − 11ᵀ/N)} ≤ min(rank{R}, rank{I −11ᵀ/N}), requiring rank{R} ≤ N −K ensures thatrank{R(I − 11ᵀ/N)} ≤ N −K which is necessaryfor (6) to have a solution.

Assumption 2. All ground truth clusters have thesame number of vertices. Thus, |Ci| = N

K for alli ∈ [K]. This implicitly requires N to be a multipleof K.

This assumption, together with the next one, al-lows us to compute the smallest K eigenvalues ofthe laplacian matrix in the expected case. This is acrucial step in proving Theorem 5.3.

Assumption 3. R is a d-regular graph for someK ≤ d ≤ N . Moreover, Rii = 1 for all i ∈ [N ], andeach vertex in R is connected to d/K vertices fromcluster Ci for all i ∈ [K] (including the self-loop). Asin Assumption 2, this requires d to be a multiple ofK.

Recall that the function π : [N ]→ [K] assigns eachnode to a cluster in R-SBM. Assumption 3 ensuresthe existence of a π for which the correspondingground-truth clusters are fair under R. Namely, letπ(i) = k, if kNK ≤ i ≤ (k + 1)NK for all i ∈ [N ], andk ∈ [K]. It can be easily verified that the resulting

clusters Ck = {vi : π(i) = k}, k ∈ [K] are fair underR.

Remark 5.2. It is important to note that in practice,Algorithm 1 only requires Assumption 1 to ensurethat Y ᵀLY has at least K orthonormal eigenvectors.Assumptions 2 and 3 are only needed for our theo-retical analysis.

The next theorem states our main result. It placesa high-probability upper bound on the fraction ofmisclustered vertices for similarity graphs G drawnfrom a R-SBM, where the representation graph Rsatisfies Assumptions 1–3. Let Θ ∈ {0, 1}N×K in-dicate the ground-truth cluster memberships, i.e.,Θij = 1 ⇔ vi ∈ Cj . Similarly, Θ ∈ {0, 1}N×K indi-cates the clusters of G predicted by Algorithm 1, i.e.,Θij = 1 if and only if Algorithm 1 assigns vi to thejth cluster. Further, let J be the set of all K ×Kpermutation matrices. The fraction of misclusterednodes is given by M(Θ, Θ) = minJ∈J

1N ||Θ− ΘJ ||0

(Lei & Rinaldo, 2015).Recall that the ground truth clusters C1, . . . , CK

are fair by construction. Thus, a low M(Θ, Θ) in-dicates that the clusters returned by the algorithmare also fair. Theorem 5.3 uses the eigenvalues ofthe Laplacian matrix in the expected case, whichis defined as L = D − A, where A = E[A] is theexpected adjacency matrix for a graph sampled fromR-SBM and D ∈ RN×N is a diagonal matrix suchthat Dii =

∑Nj=1Aij , for all i ∈ [N ].

Theorem 5.3. Let µ1 ≤ µ2 ≤ · · · ≤ µN−r denotethe eigenvalues of Y ᵀLY . Define γ = µK+1 − µK .Under assumptions 1–3, there exists a universal con-stant const(C,α), such that if γ satisfies

γ2 ≥ const(C,α)(2 + ε)pNK logN,

and p ≥ C logN/N for some C > 0, then,

M(Θ, Θ) ≤ const(C,α)(2 + ε)

γ2pN logN,

for every ε > 0 with probability at least 1 − 2N−α

when a (1 + ε)-approximate algorithm for k-meansclustering is used in Step 6 in Algorithm 1.

Theorem 5.3 implies that M(Θ, Θ) = o(1) withprobability 1− o(1) if γ = ω(

√pNK logN)2. Thus,

2A note on the asymptotic notations: If f(n) = o(g(n))

then limn→∞f(n)g(n)

= 0. Similarly, if f(n) = ω(g(n)), then

limn→∞f(n)g(n)

= ∞. Clearly, if f(n) = ω(g(n)) then g(n) =

o(f(n)).

in this case Algorithm 1 is weakly consistent (Abbe,2018). This condition on γ is often satisfied in manyinteresting cases. For example, when there are Pprotected groups as in Kleindessner et al. (2019b),the equivalent representation graph has P cliquesthat are not connected to each other (see the relationwith statistical fairness in Section 3). One can showthat γ = θ(N/K) in this case (Kleindessner et al.,2019b) which satisfies the criterion given above if Kis not too large.

Theorem 5.3 requires a (1+ε)-approximate solutionto the k-means clustering problem. Several efficientalgorithms have been proposed in the literature forthis task (Kumar et al., 2004; Arthur & Vassilvitskii,2007; Ahmadian et al., 2017). Such algorithms arealso available in commonly used software packageslike MATLAB and scikit-learn. The assumptionp ≥ C logN/N controls the sparsity of the graphand is required to prove the consistency of normalspectral clustering as well (Lei & Rinaldo, 2015).The proof of Theorem 5.3 is given in Appendix

B. It follows the commonly used template for suchresults (Rohe et al., 2011; Lei & Rinaldo, 2015). Wefirst compute the expected Laplacian matrix L underR-SBM and show that its top K eigenvectors can beused to recover the ground-truth clusters. We thenshow that these top K eigenvectors lie in the nullspace of R(I−11ᵀ/N), and hence are also the top Keigenvectors of Y ᵀLY . This implies that Algorithm 1returns the ground truth clusters in the expected case.We then use matrix perturbation arguments as in Lei& Rinaldo (2015) to establish the high probabilityconsistency result in the general case when the graphis sampled from R-SBM.Our analysis recovers the consistency result in

Kleindessner et al. (2019b) where γ = θ(N/K), asmentioned earlier. However, as opposed to them,our arguments apply more generally to d-regularrepresentation graphs instead of being limited to thecase where R is a block-diagonal matrix.

6 Related Work

Our work is closely related to constrained clustering(Basu et al., 2008) as fairness requirements restrictthe set of acceptable clusters. Most methods forconstrained clustering apply to variations of pairwisemust-link or cannot-link constraints, which specifywhich pairs of nodes must or must not belong tothe same cluster (Yu & Shi, 2001, 2004; Xu et al.,

2009; Wang & Davidson, 2010; Rangapuram & Hein,2012; Cucuringu et al., 2016). In contrast, fairnessnotions for clustering are usually concerned with thedistribution of examples within clusters, for example,to enforce diversity (Chierichetti et al., 2017).

As noted before, several fairness notions for cluster-ing have been proposed. While most study statisticalfairness (Chierichetti et al., 2017; Rösner & Schmidt,2018; Bercea et al., 2019; Backurs et al., 2019; Beraet al., 2019; Kleindessner et al., 2019b), a few studyindividual fairness as well (Chen et al., 2019; Ma-habadi & Vakilian, 2020; Jung et al., 2020; Kleindess-ner et al., 2020; Anderson et al., 2020). Chierichettiet al. (2017) assumes that examples belong to one ofthe two protected groups and require each protectedgroup to have a proportional representation in allclusters. Rösner & Schmidt (2018) extend this ideato multiple protected groups. Bercea et al. (2019) re-lax the requirement of exact proportions by allowingthem to lie in a range. Bera et al. (2019) allow theprotected groups to overlap with each other. Efficientalgorithms for finding fair clusters under these cri-teria have also been proposed (Schmidt et al., 2018;Ahmadian et al., 2019; Kleindessner et al., 2019b).

Individual fairness criteria require each point tobe sufficiently close to a cluster centroid (Mahabadi& Vakilian, 2020). For example, Chen et al. (2019)allow a group of points to form their own cluster ifthere is another center that is closer to all the points.Taking a different approach, Anderson et al. (2020)adapt the individual fairness criterion from super-vised learning (Dwork et al., 2012) to the clusteringsetting. Our fairness notion is conditioned on therepresentation graph and can enforce both statisticaland individual fairness for different configurations ofthis graph.

We develop a spectral clustering algorithm to findapproximately fair clusters under the proposed fair-ness criterion. Spectral clustering a popular choiceamong clustering algorithms (von Luxburg, 2007).It not only works well in practice but is also backedby strong theoretical guarantees on its performance(von Luxburg, 2007; Rohe et al., 2011; Lei & Rinaldo,2015). It is especially suited for the practical casewhen one observes the pairwise similarity betweenexamples rather than directly observing the features.Most statistical guarantees for spectral clustering usethe Stochastic Block Model (Holland et al., 1983).

Kleindessner et al. (2019b) proposed fair spectralclustering (FairSC) for the proportionality fairness

500 1000 1500Number of Nodes

0.2

0.4

0.6

0.8

1.0

%Ac

cura

cy

Number of Clusters = 5

SCFair-SCGen-FairSC

Figure 1: Comparing Gen-FairSC with FairSC(Kleindessner et al., 2019b) and standard spectralclustering (SC) on d-regular representation graphs.d = N/10, K = 5, and FairSC uses P = 5 groupsas explained before Section 7.1.

criterion, (Chierichetti et al., 2017). We follow asimilar approach but our algorithm considers themore general fairness criterion proposed in Section 3.Notably, we show that the algorithm in Kleindessneret al. (2019b) and their consistency analysis can berecovered as a special case of our setting. Moreover,our experiments demonstrate a few cases where ouralgorithm outperforms their algorithm.

7 Experiments

We perform three types of experiments. In thefirst two cases, we use synthetic data to validateour theoretical results using d-regular representationgraphs and show that Algorithm 1 works well inpractice even when the representation graph is notd-regular. Finally, we demonstrate the effectivenessof the proposed algorithm on a real-world dataset inSection 7.2.

FairSC (Kleindessner et al., 2019b) assumes thateach vertex belongs to one of the P protected groupsP1, . . . ,PP ⊆ V. Our algorithm reduces to FairSCby using a representation graph where nodes areconnected if and only if they belong to the sameprotected group. Thus, we only experiment withrepresentation graphs R that are not of the above-mentioned form. Naturally, FairSC is not directlyapplicable in such settings. Nonetheless, to comparewith FairSC, we cluster R using standard spectralclustering and use these clusters as protected groupsfor FairSC.

0 100 200 300 400# Protected groups (P) for FairSC

Rank of Representation Graph (r) for Gen-FairSC

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Aver

age

Bala

nce

0.000

0.005

0.010

0.015

0.020

0.025

0.030

Mod

ular

ity

#Nodes: 500, #Clusters: 5Fair-SC (Bal.)Gen-FairSC (Bal.)Fair-SC (Mod.)Gen-FairSC (Mod.)

Figure 2: Comparison between Gen-FairSC andFairSC (Kleindessner et al., 2019b) on representa-tion graphs sampled from an SBM. Varying P inFairSC does not significantly improve the fairnessor quality of clusters. However, the rank r can betuned in Gen-FairSC to a trade-off between clusterquality and fairness.

7.1 Experiments using Synthetic Data

Figure 1 compares the performance of Gen-FairSCwith FairSC and standard spectral clustering. Wesample d-regular representation graphs under As-sumptions 2 and 3 for various values of N usingK = 5 and d = N/10. We then sample adjacencymatrices corresponding to the representation graphsusing R-SBM with parameters: p = 0.4, q = 0.3,r = 0.2, and s = 0.1. Figure 1 shows the average ac-curacy of each algorithm across 10 independent runsfor different values of N . Gen-FairSC is able torecover fair clusters (ground-truth clusters are fair),whereas both spectral clustering and FairSC haveinferior performance. This confirms our theoreticalanalysis from Theorem 5.3.In the second case, the nodes are divided into

P = 5 protected groups as in Kleindessner et al.(2019b), and R is sampled from an SBM. Nodes insame (different) protected group(s) are connectedwith probability pin = 0.8 (pout = 0.2). Conditionedon this R, we sample an adjacency matrix from R-SBM as before. Note that such an R is not d-regular.Moreover, a graph R generated this way may violateAssumption 1, which is crucial to ensure that (6)is feasible. Thus, instead of using R directly, weuse the best rank r approximation of its adjacencymatrix R in (6). This approximation need not havebinary elements, but it works well in practice.Let Ni = {j : Rij = 1} be the set of neighbors of

node vi in R. We compute ρi = mink,`∈[K]Ck∩NiC`∩Ni

for

each node i ∈ [N ] to measure the fairness of proposedclusters C1, . . . , CK ⊆ [N ]. While ρi looks similar tothe notion of balance defined from the perspective ofprotected groups in Chierichetti et al. (2017), in thiscase, it is associated with an individual vi. Hence, werefer to it as individual balance for node i. Clearly,0 ≤ ρi ≤ 1, and higher values are desirable. Weuse average balance ρ = 1

N

∑Ni=1 ρi to measure the

fairness of clusters.Figure 2 uses N = 500, K = 5, P = 5, and the

same values of p, q, r, and s as before. The x-axiscorresponds to the number of protected groups (forFairSC3) and the rank r used in the approximationabove (for Gen-FairSC). The y-axis represents av-erage balance ρ and modularity score for the clusters.Figure 2 shows the trade-off between clustering ac-curacy and fairness. One can choose an appropriatevalue of r (for example, r = 300 in Figure 2), anduse Gen-FairSC to get good quality clusters with ahigh balance. Note that both modularity score andbalance are low for FairSC.

7.2 Experiments using Real Data

We use the FAO trade network (De Domenico et al.,2015), a multiplex network based on data from theFood and Agriculture Organization (FAO) of theUnited Nations. It has 214 nodes representing coun-tries and 364 layers corresponding to commoditieslike coffee, banana, barley, etc. An edge betweentwo countries in a layer indicates the volume of thecorresponding commodity traded between them.

We convert the weighted graph in each layer to anunweighted graph by connecting every node with itsfive nearest neighbors and make all the edges undi-rected. We use the first 182 layers to construct therepresentation graph R. Nodes in R are connectedif they are linked in either of these layers. Similarly,the next 182 layers are used to construct the graphG. Note that R constructed this way is not d-regular.The goal is to find clusters in G that are fair withrespect to R.Clusters based on G consider the trade of com-

modities 183–364 only. However, countries also havetrade relations in R, leading to shared economic in-terests. Assume that the members of each cluster

3The term “protected groups” has been overloaded here.One set of protected groups are used to sample R from anSBM, as explained above. Then, we cluster R as explainedbefore Section 7.1 to get the second set of protected groupswhich are used by FairSC.

0 20 40 60 80 100

0.05

0.10

0.15

0.20

0.25

0.30

Aver

age

Bala

nce

Number of Clusters: 2

SCFair-SCGen-FairSC

0 20 40 60 80 100# Protected groups (P) for FairSC

Rank of Representation Graph (r) for Gen-FairSC

0.02

0.04

0.06

0.08

0.10

0.12

Mod

ular

ity S

core

SCFair-SCGen-FairSC

Figure 3: Balance and modularity scores for differentvalues of r in Gen-FairSC and P in FairSC forthe FAO trade network. Our algorithm finds betterquality clusters with higher balance.

jointly formulate the economic policies for that clus-ter. However, these policies affect everyone as theyall share a global market, incentivizing the countriesto influence the economic policies of all clusters. Be-ing fair with respect to R entails that each countryhas members in other clusters with shared interests.This enables a country to indirectly shape the policiesof other clusters.

As before, we use the low-rank approximation forthe representation graph in Gen-FairSC. Figure 3compares various algorithms and has the same se-mantics as Figure 2. Not only is Gen-FairSC ableto achieve a higher balance, it does so with a bettermodularity score as compared to Fair-SC. In prac-tice, a user would assess the relative importance ofmodularity and balance to select r. Some additionalexperiments are provided in Appendix D.

8 Conclusion

In this paper, we proposed an individual fairnesscriterion with respect to a representation graph R.

We showed that our criterion could enforce bothstatistical and individual fairness under different con-figurations. We also developed a companion spectralclustering algorithm to find approximately fair clus-ters, proposed a variant of SBM called R-SBM, andestablished the weak consistency of this algorithm un-derR-SBM. Our experiments validate our theoreticalclaims. The theoretical guarantee in Section 5 appliesonly to d-regular representation graphs. Showingguarantees for more general graphs is a promisingdirection for future work. It would also be interestingto explore other algorithms for finding fair clustersunder the criterion proposed in Section 3.

References

Abbe, E. Community detection and stochastic blockmodels: Recent developments. Journal of MachineLearning Research, 18(177):1–86, 2018.

Abraham, S. S., P., D., and Sundaram, S. S. Fair-ness in clustering with multiple sensitive attributes.Proceedings of the 23rd International Conferenceon Extending Database Technology, 2020.

Ahmadian, S., Norouzi-Fard, A., Svensson, O., andWard, J. Better guarantees for k-means andeuclidean k-median by primal-dual algorithms.Symposium on Foundations of Computer Science(FOCS), 2017.

Ahmadian, S., Epasto, A., Kumar, R., and Mah-dian, M. Clustering without over-representation.Proceedings of the 25th ACM SIGKDD Interna-tional Conference on Knowledge Discovery & DataMining, pp. 267–275, 2019.

Ahmadian, S., Epasto, A., Kumar, R., and Mahdian,M. Fair correlation clustering. In proceedings ofthe Twenty Third International Conference on Ar-tificial Intelligence and Statistics, 108:4195–4205,2020.

Anderson, N., Bera, S. K., Das, S., and Liu, Y. Dis-tributional individual fairness in clustering. arXive-prints, art. arXiv:2006.12589, 2020.

Arthur, D. and Vassilvitskii, S. k-means++: Theadvantages of careful seeding. Symposium on Dis-crete Algorithms (SODA), 2007.

Backurs, A., Indyk, P., Onak, K., Schieber, B., Vak-ilian, A., and Wagner, T. Scalable fair clustering.Proceedings of the 36th International Conferenceon Machine Learning, 97:405–413, 2019.

Basu, S., Davidson, I., and Wagstaff, K. Constrainedclustering: Advances in algorithms, theory, andapplications. CRC Press, 2008.

Bera, S., Chakrabarty, D., Flores, N., and Negahbani,M. Fair algorithms for clustering. In Advances inNeural Information Processing Systems, volume 32,pp. 4954–4965. Curran Associates, Inc., 2019.

Bercea, I., Gross, M., Khuller, S., Kumar, A., Rösner,C., Schmidt, D., and Schmidt, M. On the cost ofessentially fair clusterings. APPROX-RANDOM,pp. 18:1–18:22, 2019.

Caton, S. and Haas, C. Fairness in machine learning:A survey, 2020.

Celis, E., Keswani, V., Straszak, D., Deshpande, A.,Kathuria, T., and Vishnoi, N. Fair and diverse dpp-based data summarization. Proceedings of the 35thInternational Conference on Machine Learning, 80:716–725, 2018a.

Celis, L. E., Straszak, D., and Vishnoi, N. K. Rankingwith fairness constraints, 2018b.

Chen, X., Fain, B., Lyu, L., and Munagala, K. Pro-portionally fair clustering. Proceedings of the 36thInternational Conference on Machine Learning, 97:1032–1041, 2019.

Chierichetti, F., Kumar, R., Lattanzi, S., and Vas-silvitskii, S. Fair clustering through fairlets. InAdvances in Neural Information Processing Sys-tems, volume 30, pp. 5029–5037. Curran Associates,Inc., 2017.

Cucuringu, M., Koutis, I., Chawla, S., Miller, G.,and Peng, R. Simple and scalable constrained clus-tering: a generalized spectral method. Proceedingsof the 19th International Conference on ArtificialIntelligence and Statistics, 41:445–454, 2016.

De Domenico, M., Nicosia, V., Arenas, A., and La-tora, V. Structural reducibility of multilayer net-works. Nature Communications, 6(1), 2015.

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., andZemel, R. Fairness through awareness. ITCS ’12,pp. 214—-226, 2012.

Hardt, M., Price, E., Price, E., and Srebro, N. Equal-ity of opportunity in supervised learning. In Ad-vances in Neural Information Processing Systems,volume 29, pp. 3315–3323. Curran Associates, Inc.,2016.

Holland, P. W., Laskey, K. B., and Leinhardt, S.Stochastic blockmodels: First steps. Social Net-works, 5(2):109–137, 1983.

Jung, C., Kannan, S., and Lutz, N. A center in yourneighborhood: Fairness in facility location. Sym-posium on Foundations of Responsible Computing(FORC), 2020.

Kleindessner, M., Awasthi, P., and Morgenstern, J.Fair k-center clustering for data summarization.Proceedings of the 36th International Conferenceon Machine Learning, 97:3448–3457, 2019a.

Kleindessner, M., Samadi, S., Awasthi, P., and Mor-genstern, J. Guarantees for spectral clusteringwith fairness constraints. Proceedings of the 36thInternational Conference on Machine Learning,PMLR, 97:3458–3467, 2019b.

Kleindessner, M., Awasthi, P., and Morgenstern, J.A notion of individual fairness for clustering, 2020.

Kumar, A., Sabharwal, Y., , and Sen, S. A simplelinear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. Symposium onFoundations of Computer Science (FOCS), 2004.

Lei, J. and Rinaldo, A. Consistency of spectralclustering in stochastic block models. The Annalsof Statistics, 43(1):215–237, 2015.

Mahabadi, S. and Vakilian, A. Individual fairness fork-clustering. Proceedings of the 37th InternationalConference on Machine Learning, 119:6586–6596,2020.

Rangapuram, S. S. and Hein, M. Constrained 1-spectral clustering. Proceedings of the FifteenthInternational Conference on Artificial Intelligenceand Statistics, 22:1143–1151, 2012.

Rohe, K., Chatterjee, S., and Yu, B. Spectral clus-tering and the high-dimensional stochastic block-model. The Annals of Statistics, 39(4):1878–1915,2011.

Rösner, C. and Schmidt, M. Privacy preservingclustering with constraints. In ICALP, 2018.

Samadi, S., Tantipongpipat, U., Morgenstern, J. H.,Singh, M., and Vempala, S. The price of fairpca: One extra dimension. In Advances in NeuralInformation Processing Systems, volume 31, pp.10976–10987. Curran Associates, Inc., 2018.

Schmidt, M., Schwiegelshohn, C., and Sohler, C. Faircoresets and streaming algorithms for fair k-meansclustering. CoRR, abs/1812.10854, 2018.

von Luxburg, U. A tutorial on spectral clustering.Statistics and Computing, 17(4):395–416, 2007.

Vu, V. Q. and Lei, J. Minimax sparse principal sub-space estimation in high dimensions. The Annalsof Statistics, 41(6):2905–2947, 2013.

Wang, X. and Davidson, I. Flexible constrained spec-tral clustering. ACM International Conference onKnowl- edge Discovery and Data Mining (KDD),2010.

Xu, L., Li, W., and Schuurmans, D. Fast normalizedcut with linear constraints. IEEE Conference onComputer Vision and Pattern Recognition, 2009.

Yu, S. X. and Shi, J. Grouping with bias. Advancesin Neural Information Processing Systems, 14:1327–1334, 2001.

Yu, S. X. and Shi, J. Segmentation given partialgrouping constraints. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 26(2):173–183, 2004.

Supplementary Material

Protected Individual Interests across Clusters:

Spectral Clustering with Guarantees

The notation used in this paper is summarized below.

Symbol Description

V Set of nodesvi An element of set VG Given graph or network in which we need to find clustersA N ×N adjacency matrix of GR Representation graphR N ×N adjacency matrix of RN Number of nodes in the graphK Number of ground-truth clusters in Gd Degree of each node in the d-regular representation graph RCk Set of nodes in the kth cluster

p, q, r, s Parameters used by R-SBMA Expected adjacency matrix of a graph sampled from R-SBMA A+ pID Degree matrix corresponding to AL Laplacian corresponding to A and D: L = D −AGk N ×N diagonal matrix where (Gk)ii = 1 if vi ∈ Ck and 0 otherwise||X||

√λmax(XᵀX): Square-root of the maximum eigenvalue of XᵀX

Y N ×N − r matrix containing orthonormal basis for null(R(I − 11ᵀ/N))yi ith column of YZ Z ∈ RN−r×K ; solution to (6) in the expected caseZ Z ∈ RN−r×K ; solution to (6) in the observed caseµi ith-smallest eigenvalue of Y ᵀLYγ µK+1 − µK ; eigen-gap

A Proof of Lemma 3.2

Fix an arbitrary node vi ∈ V and k ∈ [K]. Recall that H is such that

Hik =

1√|Ck|

if vi ∈ Ck

0 otherwise.

Because R(I − 11ᵀ/N)H = 0,N∑j=1

RijHjk =1

N

( N∑j=1

Rij

)( N∑j=1

Hjk

)⇒ 1√

|Ck||{vj ∈ V : Rij = 1 ∧ vj ∈ Ck}| =

1

N|{vj ∈ V : Rij = 1}| |Ck|√

|Ck|

⇒ |{vj ∈ V : Rij = 1 ∧ vj ∈ Ck}||Ck|

=|{vj ∈ V : Rij = 1}|

N.

Because this holds for an arbitrary vi ∈ V and k ∈ [K], R(I − 11ᵀ/N)H = 0 implies the fairness criteria inDefinition 3.1.

B Proof of Theorem 5.3

We prove Theorem 5.3 via a series of technical lemmas. See Appendix C for the proofs of these lemmas.We begin by showing that Algorithm 1 recovers the ground-truth clusters when the expected adjacencymatrix of graph G is specified as input. For the remainder of this proof, we will assume that Assumptions1–3 are satisfied.

The first lemma shows that certain vectors that can be used to recover the ground-truth clusters indeedsatisfy the fairness criteria in (4).

Lemma B.1. The N -dimensional vector of all ones, denoted by 1, is an eigenvector of R with eigenvalued. Define uk ∈ RN for k ∈ [K − 1] as,

(uk)i =

{1 if vi ∈ Ck− 1K−1 otherwise,

where (uk)i is the ith element of uk. Then, 1,u1, . . . ,uK−1 ∈ null(R(I− 1N 11ᵀ)). Moreover, 1,u1, . . . ,uK−1

are linearly independent.

We use A to denote the expected adjacency matrix of graph G. Clearly, A = A − pI, where A is suchthat Aij = P (Aij = 1) for i 6= j (see (7)) and Aii = p otherwise. Note that

Ax = λx ⇔ Ax = (λ− p)x. (8)

Simple algebra shows that A can be written as

A = qR+ s(11ᵀ −R) + (p− q)K∑k=1

GkRGk + (r − s)K∑k=1

Gk(11ᵀ −R)Gk, (9)

where for all k ∈ [K], Gk ∈ RN×N is a diagonal matrix such that (Gk)ii = 1 if vi ∈ Ck and 0 otherwise.The next lemma shows that 1,u1, . . . ,uK−1 defined in Lemma B.1 are eigenvectors of A.

Lemma B.2. Let 1,u1, . . . ,uK−1 be as defined in Lemma B.1. Then,

A1 = λ11 where λ1 = qd+ s(N − d) + (p− q) dK

+ (r − s)N − dK

, and

Auk = λ1+kuk where λ1+k = (p− q) dK

+ (r − s)N − dK

.

Let L = D −A be the expected Laplacian matrix, where D is a diagonal matrix with Dii =∑N

j=1Aijfor all i ∈ [N ]. It is easy to see that Dii = λ1 − p for all i ∈ [N ] as A1 = (λ1 − p)1 by (8) and Lemma B.2.Thus, D = (λ1 − p)I and hence any eigenvector of A with eigenvalue λ is also an eigenvector of L witheigenvalue λ1 − λ. That is, if Ax = λx,

Lx = (D −A)x = ((λ1 − p)I − (A − pI))x = (λ1 − λ)x. (10)

Hence, the eigenvectors of L corresponding to the K smallest eigenvalues are the same as the eigenvectorsof A corresponding to the K largest eigenvalues.

Recall that the columns of Y in (6) contain the orthonormal basis for the null space of R(I −11ᵀ/N). Inthe absence of a fairness constraint (or equivalently if Y = IN×N ), the optimal solution to (6) would havebeen the first K eigenvectors of L. However, with the fairness constraint, we only need to optimize overvectors that belong to the null space of R(I−11ᵀ/N). By Lemma B.1, 1,u1, . . . ,uK−1 ∈ null(R(I−11ᵀ/N))and are linearly independent. Let y1 = 1/

√N and y2, . . . ,yK be orthonormal vectors that span the same

space as u1, . . . ,uK−1. The next lemma computes such y2, . . . ,yK .

Lemma B.3. Define y1+k ∈ RN for k ∈ [K − 1] as

(y1+k)i =

0 if vi ∈ Ck′ s.t. k′ < k

K−k√NK

(K−k)(K−k+1)if vi ∈ Ck

− 1√NK

(K−k)(K−k+1)otherwise.

Then, for k ∈ [K − 1], y1+k are orthonormal vectors that span the same space as u1,u2, . . . ,uK−1 andyᵀ

1y1+k = 0.

Let T ∈ RN×K be such that it has y1, . . . ,yK as its columns. If two nodes belong to the same cluster,the rows corresponding to these nodes in TU will be the same for any U ∈ RK×K such that UᵀU = I.Thus, any K orthonormal vectors belonging to the span of y1, . . . ,yK can be used to recover the groundtruth clusters.Let Z ∈ RN−r×K be a solution to the optimization problem (6) in the expected case with A as input.

The next lemma shows that columns of Y Z indeed lie in the span of y1, . . . ,yK . Thus, the k-meansclustering step in Algorithm 1 will return the correct ground truth clusters when A is passed as input.

Lemma B.4. Let y1 = 1/√N and y1+k be as defined in Lemma B.3 for k ∈ [K − 1]. Further, let Z be

the optimal solution to the optimization problem in (6) with L set to L. Then, the columns of Y Z lie inthe span of y1,y2, . . . ,yK .

Now that we have established that Algorithm 1 returns the correct ground truth clusters in the expectedcase, we will use arguments from matrix perturbation theory to show a high-probability bound on thenumber of mistakes made by the algorithm. In particular, we need an upper bound on ||Y ᵀLY − Y ᵀLY ||where L is the Laplacian matrix for a graph randomly sampled from R-SBM and ||X|| =

√λmax(XᵀX)

for any matrix X.Note that ||Y || = ||Y ᵀ|| = 1 as Y ᵀY = I. Thus,

||Y ᵀLY − Y ᵀLY || ≤ ||Y ᵀ|| ||L− L|| ||Y || = ||L− L||. (11)

Moreover,||L− L|| = ||D −A− (D −A)|| ≤ ||D −D|| + ||A−A||.

The next two lemmas bound the two terms on the right hand side of the inequality above, thus providingan upper bound on ||L− L||, and hence on ||Y ᵀLY − Y ᵀLY || by (11).

Lemma B.5. Assume that p ≥ C logNN for some constant C > 0. Then, for every α > 0, there exists a

constant const1(C,α) that only depends on C and α such that

||D −D|| ≤ const1(C,α)√pN logN

with probability at-least 1−Nα.

Lemma B.6. Assume that p ≥ C logNN for some constant C > 0. Then, for every α > 0, there exists a

constant const2(C,α) that only depends on C and α such that

||A−A|| ≤ const2(C,α)√pN

with probability at-least 1−Nα.

From Lemma B.5 and B.6, we conclude that there always exists a constant, denoted by const3(C,α) =max{const1(C,α), const2(C,α)}, such that for any α > 0, with probability at least 1− 2N−α,

||Y ᵀLY − Y ᵀLY || ≤ ||L− L|| ≤ const3(C,α)√pN logN. (12)

Let Z and Z denote the optimal solution of (6) in the expected (L replaced with L) and observed caserespectively. We will use (12) to show a bound on ||Y Z − Y Z||F in Lemma B.7. This will be used to arguethat Algorithm 1 makes a small number of mistakes when the graph is sampled from R-SBM.

Lemma B.7. Let µ1 ≤ µ2 ≤ · · · ≤ µN−r be eigenvalues of Y ᵀLY and α1 ≤ α2 ≤ · · · ≤ αN−r be eigenvaluesof Y ᵀLY . Further, let the columns of Z ∈ RN−r×K and Z ∈ RN−r × K correspond to the leading Keigenvalues of Y ᵀLY and Y ᵀLY respectively. Define γ = µK+1 − µK . Then, with probability at least1− 2N−α,

infU∈RK×K :UUᵀ=UᵀU=I

||Y Z − Y ZU ||F ≤ const3(C,α)4√

2K

γ

√pN logN,

where const3(C,α) is from (12).

Let T ∈ RN×K be a matrix containing the first K columns of Y , and let Ti denote the ith row of T .Simple calculation using Lemma B.3 shows that,

||Ti − Tj ||2 =

{0 if vi and vj belong to the same community√

2KN otherwise.

By Lemma B.4, Z can be chosen such that Y Z = T . Let U be the matrix that solvesinfU∈RK×K :UUᵀ=UᵀU=I ||Y Z − Y ZU ||F . As U is orthogonal, ||TiU − TjU ||2 = ||Ti − Tj ||2. The followinglemma is a direct consequence of Lemma 5.3 in Lei & Rinaldo (2015).

Lemma B.8. Let T and U be as defined above. For any ε > 0, let Θ ∈ RN×K be the assignment matrix4

returned by a (1 + ε)-approximate solution to the k-means clustering problem when rows of Y Z are providedas input features, and let µ1, µ2, . . . , µK ∈ RK be the estimated cluster centroids. Define T = Θµ whereµ ∈ RK×K contains µ1, . . . , µK as its rows. Further, define δ =

√2KN , and Sk = {vi ∈ Ck : ||Ti−Ti|| ≥ δ/2}.

Then,

δ2K∑k=1

|Sk| ≤ 8(2 + ε)||TUᵀ − Y Z||2F . (13)

Moreover, if γ from Lemma B.7 satisfies γ2 > const(C,α)(2 + ε)pNK logN for a universal constantconst(C,α), there exists a permutation matrix J ∈ RK×K such that

(ΘJ)i = Θi, ∀ i ∈ [N ]\(∪Kk=1Sk). (14)

Here, (ΘJ)i and Θi represent the ith row of matrix ΘJ and Θ respectively.

Recall that M(Θ, Θ) = minJ∈J1N ||Θ− ΘJ ||0, where J is the set of all K ×K permutation matrices. In

particular, for the matrix J used in Lemma B.8, M(Θ, Θ) ≤ 1N ||Θ− ΘJ ||0. But, according to Lemma B.8,

||Θ− ΘJ ||0 ≤ 2K∑k=1

|Sk|.

Using Lemma B.7 and B.8, we get:

M(Θ, Θ) ≤ 1

N||Θ− ΘJ ||0

≤ 2

N

K∑k=1

|Sk|

≤ 16(2 + ε)

Nδ2||TUᵀ − Y Z||2F

≤ const3(C,α)2 512(2 + ε)

Nδ2γ2pNK logN.

Noting that δ =√

2KN and setting const(C,α) = 256× const3(C,α)2 finishes the proof.

4Θik = 1 if the algorithm assigns vertex vi to the kth cluster and Θik = 0 otherwise.

C Proof of Technical Lemmas from Appendix B

C.1 Proof of Lemma B.1

Because R is a d-regular graph, it is easy to see that R1 = d1. Recall from Section 5.2 that I − 1N 11ᵀ is a

projection matrix that removes the component of any vector x ∈ RN along the all ones vector 1. Thus,(I − 1

N 11ᵀ)1 = 0 and hence 1 ∈ null(R(I − 1N 11ᵀ)). Moreover, using Assumption 2,

1ᵀuk =

N∑i=1

(uk)i =∑

i : vi∈Ck

(uk)i +∑

i : vi /∈Ck

(uk)i =N

K− 1

K − 1

(N − N

K

)= 0.

Thus, R(I − 1N 11ᵀ)uk = Ruk. Let us compute the ith element of the vector Ruk for an arbitrary i ∈ [N ].

(Ruk)i =N∑j=1

Rij(uk)j =∑

j :Rij=1&vj∈Ck

1−∑

j :Rij=1&vj /∈Ck

1

K − 1=

d

K− 1

K − 1

(d− d

K) = 0.

Here, the second last equality follows from Assumptions 2 and 3. Thus, Ruk = 0 and hence uk ∈null(R(I − 1

N 11ᵀ)).Because 1ᵀuk = 0 for all k ∈ [K − 1], to show that 1,u1, . . . ,uK−1 are linearly independent, it is

enough to show that u1, . . . ,uK−1 are linearly independent. Consider the ith component of∑K−1

k=1 αkukfor arbitrary α1 . . . , αK−1 ∈ R and i ∈ [N ]. If vi ∈ CK , then

(K−1∑k=1

αkuk

)i

= − 1

K − 1

K−1∑k=1

αk.

Similarly, when vi ∈ Ck′ for some k′ ∈ [K − 1], we have,

(K−1∑k=1

αkuk

)i

= αk′ −1

K − 1

K−1∑k=1k 6=k′

αk.

Thus,∑K−1

k=1 αkuk = 0 implies that − 1K−1

∑K−1k=1 αk = 0 and αk′− 1

K−1

∑K−1k=1,k 6=k′ αk = 0 for all k′ ∈ [K−1].

Subtracting the first equation from the second gives αk′ + 1K−1αk′ = 0, which in turn implies that αk′ = 0

for all k′ ∈ [K − 1]. Thus, 1,u1, . . . ,uK−1 are linearly independent.


Using the representation of A from (9), Lemma B.1, and Assumption 2, we get,

A1 = qR1 + s(11ᵀ −R)1 + (p− q)K∑k=1

GkRGk1 + (r − s)K∑k=1

Gk(11ᵀ −R)Gk1

= qd1 + sN1− sd1 + (r − s)K∑k=1

Gk11ᵀGk1 + [(p− q)− (r − s)]

K∑k=1

GkRGk1

= qd1 + s(N − d)1 + (r − s)NK

1 + [(p− q)− (r − s)] dK

1

=[qd+ s(N − d) + (p− q) d

K+ (r − s)N − d

K

]1.

Similarly, for any k ∈ [K],

Auk = qRuk + s(11ᵀ −R)uk + (p− q)K∑k=1

GkRGkuk + (r − s)K∑k=1

Gk(11ᵀ −R)Gkuk

= 0 + 0 + (r − s)K∑k=1

Gk11ᵀGkuk + [(p− q)− (r − s)]

K∑k=1

GkRGkuk

=[(r − s)N

K+ [(p− q)− (r − s)] d

K

]uk

=[(p− q) d

K+ (r − s)N − d

K

]uk.


Recall that y1 = 1/√N . We start by showing that yᵀ

1y1+k = 0.

yᵀ1y1+k =

1√N

N∑i=1

(yk)i =1√N

[ ∑i:vi∈Ck

(K − k)qk −∑

i:vi∈Ck′ ,k′>kqk

]=

1√N

[NK

(K − k)qk −(N − kN

K

)qk

]= 0.

Here, qk = 1√NK

(K−k)(K−k+1). Now consider yᵀ

1+k1y1+k2 for k1, k2 ∈ [K − 1] such that k1 6= k2. Assume

without loss of generality that k1 < k2.

yᵀ1+k1

y1+k2 =∑

i:vi∈Ck2

(−qk1)(K − k2)qk2 +∑

i:vi∈Ck,k>k2

(−qk1)(−qk2)

= −qk1qk2(K − k2)N

K+ qk1qk2

(N − k2

N

K

)= 0.

Finally, for any k ∈ [K − 1],

yᵀ1+ky1+k =

∑i:vi∈Ck

(K − k)2q2k +

∑i:vi∈Ck′ ,k′>k

q2k = q2

k

[NK

(K − k)2 +N − kNK

]= 1,

where the last equality follows from the definition of qk.


Note that columns of Z are the eigenvectors of Y ᵀAY corresponding to its K largest eigenvalues as Zis the solution to (6) with L set to L. The calculations below show that for all k ∈ [K], ek ∈ RN−r, thekth standard basis vector, is an eigenvector of Y ᵀAY with eigenvalue λk where λ1, . . . , λK were defined inLemma B.2.

Y ᵀAY ek = Y ᵀAyk = λkYᵀyk = λkek.

The third equality follows from Lemma B.2 because y2, . . . ,yK ∈ span(u1, . . . ,uK−1), and u1, . . . ,uK−1

are all eigenvectors of A with the same eigenvalue. To show that columns of Y Z lie in the span ofy1, . . . ,yK , it is enough to show that e1, . . . , eK are the top K eigenvectors of Y ᵀAY .

Let α ∈ RN−r be an eigenvector of Y ᵀAY such that α /∈ span(e1, . . . , eK) and ||α||22 = 1. Then, becauseY ᵀAY is symmetric, αᵀy1 = 0, i.e. α1 = 0, where αi denotes the ith element of α. The eigenvaluecorresponding to α is given by

λα = αᵀY ᵀAYα.

Let x = Yα =∑N−r

i=1 αiyi, then λα = xᵀAx. Using the definition of A from (9), we get,

xᵀAx = (q − s)xᵀRx + sxᵀ11ᵀx + [(p− q)− (r − s)]K∑

k=1

xᵀGkRGkx + (r − s)K∑

k=1

xᵀGk11ᵀGkx. (15)

We will consider each term in (15) separately. Before that, note that y2, . . . ,yN−r ∈ null(R). This isbecause y1 = 1/

√N and y2, . . . ,yN−r are orthogonal to y1. Thus,

(I − 11ᵀ/N)yi = 0⇒ F (I − y1yᵀ1)yi = 0⇒ Fyi = 0, i = 2, 3, . . . , N − r. (16)

Now consider the first term in (15).

xᵀRx =N−r∑i=1

N−r∑j=1

αiαjyᵀiRyj = α2

1yᵀ1Ry1 = 0.

Here, the second equality follows from (16), and the third equality follows as α1 = 0. Similarly, for thesecond term in (15),

xᵀ11ᵀx = Nxᵀy1yᵀ1x = N

N−r∑i=1

N−r∑j=1

αiαjyᵀi y1y

ᵀ1yj = Nα2

1(yᵀ1y1)2 = 0.

Note that Gk = GkGk because it is a diagonal matrix with either 0 or 1 on its diagonal. For the thirdterm in (15),

xᵀGkRGkx = xᵀGkGkRGkGkx = x[k]R[k]x[k] ≤d

K||x[k]||22, (17)

where x[k] ∈ RN/K contains elements of x corresponding to vertices in Ck. Similarly, R[k] ∈ RN/K×N/Kcontains the submatrix of R restricted to rows and columns corresponding to vertices in Ck. The lastinequality holds because R[k] is a d/K-regular graph by Assumption 3, hence its maximum eigenvalue isd/K. Thus,

K∑k=1

xᵀGkFGkx ≤d

K

K∑k=1

||x[k]||22 =d

K||x||22 =

d

K.

Similarly, for the fourth term in (15),

xᵀGk11ᵀGkx = xᵀGkGk11

ᵀGkGkx = xᵀ[k]1N/K1

ᵀN/Kx[k] ≤

N

K||x[k]||22.

Here, 1N/K ∈ RN/K is an all-ones vector and the last inequality holds because 1N/K1ᵀN/K is a N/K-regular

graph. Because x[k] /∈ span(y1, . . . ,yK), there is at least one k ∈ [K] for which x[k] is not a constantvector (if this was not true, x[k] will belong to span of y1, . . . ,yK). Thus, at least for one k ∈ [K],xᵀGk11

ᵀGkx <NK ||x[k]||22. Summing over k ∈ [K], we get,

K∑k=1

xᵀGk11ᵀGkx <

N

K

K∑k=1

||x[k]||22 =N

K||x||22 =

N

K.

Adding the four terms we get the following bound. For eigenvector α of Y ᵀAY such that α /∈span(e1, . . . , eK) and ||α||22 = 1,

λα = xᵀAx < [(p− q)− (r − s)] dK

+ (r − s)NK

= λK . (18)

Thus, λ1, . . . , λK are the highest K eigenvalues of Y ᵀAY and hence e1, . . . , eK are the top K eigenvectors.Thus, the columns of Y Z lie in the span of y1, . . . ,yK .


As D and D are diagonal matrices, ||D −D|| = maxi∈[N ] |Dii −Dii|. Applying union bound, we get,

P(maxi∈[N ]

|Dii −Dii| ≥ ε) ≤N∑i=1

P(|Dii −Dii| ≥ ε).

We consider an arbitrary term in this summation. For any i ∈ [N ], note that Dii =∑

j 6=iAij is a sum ofindependent Bernoulli random variables such that E[Dii] = Dii. We consider two cases depending on thevalue of p.Case 1: p > 1

2 : By Hoeffding’s inequality,

P(|Dii −Dii| ≥ ε) ≤ 2 exp(− 2ε2

N

).

Setting ε =√

2(α+ 1)√pN logN , we get for any α > 0,

P(|Dii −Dii| ≥√

2(α+ 1)√pN logN) ≤ 2 exp

(− 4p(α+ 1)N logN

N

)≤ N−(α+1).

Case 2: p ≤ 12 : By Bernstein’s inequality, as |Aij −Aij | ≤ 1 for all i, j ∈ [N ],

P(|Dii −Dii| ≥ ε) ≤ 2 exp(− ε2/2∑N

i=1 E[(Aij −Aij)2] + ε/3

).

Also note that,

E[(Aij −Aij)2] ≤ Aij(1−Aij)2 + (1−Aij)(−Aij)2 = Aij(1−Aij) ≤ Aij ≤ p.

Thus,

P(|Dii −Dii| ≥ ε) ≤ 2 exp(− ε2/2

Np+ ε/3

).

Let ε = c√pN logN for some constant c > 0 and assume that p ≥ C logN

N for some C > 0. We get,

2 exp(− ε2/2

Np+ ε/3

)= 2 exp

(− c2pN logN

2(Np+ c√pN logN/3)

)= 2 exp

(− c2 logN

2(1 + c3

√logNpN )

)

≤ 2 exp(− c2 logN

2(1 + c3√C

)

).

Let c be such that c2

2(1+c/3√C)≥ 2(α+ 1)5, then,

P(|Dii −Dii| ≥ ε) ≤ N−(α+1).

Thus, there always exists a constant const1(C,α) that depends only on C and α such that for all α > 0and for all values of p ≥ C logN/N ,

P(|Dii −Dii| ≥ const1(C,α)√pN logN) ≤ N−(α+1).

Applying the union bound over all i ∈ [N ] yields the desired result.

5Note that such a c can always be chosen as limc→∞c2

2(1+c/3√C)

= ∞.


Note that maxi,j∈[N ]Aij = p. Define g = pN = N maxi,j∈[N ]Aij . Note that, g ≥ C logN as p ≥ C logNN .

By Theorem 5.2 from Lei & Rinaldo (2015), for any α > 0, there exists a constant const2(α,C) such that,

||A−A|| ≤ const2(α,C)√g = const2(α,C)

√pN

with probability at least 1−N−α.


Because Y ᵀY = I, for any orthonormal matrix U ∈ RK×K such that UUᵀ = UᵀU = I,

||Y Z − Y ZU ||2F = ||Y (Z − ZU)||2F = trace{(Z − ZU)ᵀY ᵀY (Z − ZU)} = ||Z − ZU ||2F .

Thus, it is enough to show an upper bound on ||Z − ZU ||F , where recall that columns of Z ∈ RN−r×Kand Z ∈ RN−r×K contain the K leading eigenvectors of Y ᵀLY and Y ᵀLY respectively. Thus,


||Y Z − Y ZU ||F = infU∈RK×K :UUᵀ=UᵀU=I

||Z − ZU ||F .

By equation (2.6) and Proposition 2.2 in Vu & Lei (2013),


||Z − ZU ||F ≤√

2||ZZᵀ(I − ZZᵀ)||F .

Moreover, ||ZZᵀ(I − ZZᵀ)||F ≤√K||ZZᵀ(I − ZZᵀ)|| as rank{ZZᵀ(I − ZZᵀ)} ≤ K. Thus, we get,


||Z − ZU ||F ≤√

2K||ZZᵀ(I − ZZᵀ)||. (19)

Let µ1 ≤ µ2 ≤ · · · ≤ µN−r be eigenvalues of Y ᵀLY and α1 ≤ α2 ≤ · · · ≤ αN−r be eigenvalues of Y ᵀLY .By Weyl’s perturbation theorem,

|µi − αi| ≤ ||Y ᵀLY − Y ᵀLY ||, ∀i ∈ [N − r].

Define γ = µK+1 − µK as the eigen-gap between the Kth and (K + 1)th eigenvalues of Y ᵀLY .Case 1: ||Y ᵀLY − Y ᵀLY || ≤ γ

4 : If ||Y ᵀLY − Y ᵀLY || ≤ γ4 , then |µi − αi| ≤

γ4 for all i ∈ [N − r] by the

inequality given above. Thus, α1, α2, . . . , αK ∈ [0, µK + γ4 ] and αK+1, αK+2, . . . , αN−r ∈ [µK+1 − γ

4 ,∞).Let S = [0, µK + γ

4 ], then µ1, . . . , µK ∈ S and αK+1, . . . , αN−r /∈ S. Define δ as,

δ = min{|αi − s|, αi /∈ S, s ∈ S}.

Then, δ ≥ [µK+1 − γ/4]− [µK + γ/4] = γ/2. By Davis-Kahan sin Θ theorem,

||ZZᵀ(I − ZZᵀ)|| ≤ 1

δ||Y ᵀLY − Y ᵀLY || =

2

γ||Y ᵀLY − Y ᵀLY ||.

Case 2: ||Y ᵀLY − Y ᵀLY || > γ4 : Note that ||ZZᵀ(I − ZZᵀ)|| ≤ 1 as,

||ZZᵀ(I − ZZᵀ)|| ≤ ||ZZᵀ|| ||I − ZZᵀ|| = 1.1 = 1.

Thus, if ||Y ᵀLY − Y ᵀLY || > γ4 , then,

||ZZᵀ(I − ZZᵀ)|| ≤ 4

γ||Y ᵀLY − Y ᵀLY ||.

In both cases, ||ZZᵀ(I−ZZᵀ)|| ≤ 4γ ||Y

ᵀLY −Y ᵀLY ||. Using (11), (12) and (19), we get with probabilityat least 1− 2N−α,


||Z − ZU ||F ≤ const3(C,α)4√

2K

γ

√pN logN.

500 750 1000 1250 1500 1750 2000Number of Nodes

Degree: 40, #Clusters: 5Rank (for Gen-FairSC (approx.)): 0.1N

#Groups (for FairSC): 0.1N

0.2

0.4

0.6

0.8

1.0

Accu

racy

Accuracy vs #Nodes

SCGen-FairSC

FairSCGen-FairSC (approx.)

(a) Accuracy vs No. of Nodes

2 3 4 5 6 7 8Number of Clusters

#Nodes: 2000, #Degree: 40Rank (for Gen-FairSC (approx.)): 0.1N


0.2

0.4

0.6

0.8

1.0

Accu

racy

Accuracy vs #Clusters

SCGen-FairSC


(b) Accuracy vs No. of Clusters

40 60 80 100 120Degree

#Nodes: 2000, #Clusters: 5Rank (for Gen-FairSC (approx.)): 0.1N


0.2

0.4

0.6

0.8

1.0

Accu

racy

Accuracy vs Degree

SCGen-FairSC


(c) Accuracy vs Degree

Figure 4: Comparison across algorithm for the setting with synthetically generated d-regular representationgraphs.


(13) directly follows from Lemma 5.3 in Lei & Rinaldo (2015). We only need to show that

8(2 + ε)

δ2||TU − Y Z||2F <

N

K.

(14) then follows from Lemma 5.3 in Lei & Rinaldo (2015). Recall that δ =√

2KN . Using Lemma B.7, we

get

8(2 + ε)

δ2||TU − Y ZU ||2F = const3(C,α)

128(2 + ε)

γ2pN2 logN <

N

K.

Here, the last inequality follows from the assumption that γ2 > const3(C,α)2.128(2 + ε)pNK logN .

D Additional Experiments

In this section, we study the variation in the performance of our algorithm as a function of quantities likenumber of nodes, number of clusters, network sparsity, etc. Similar to Section 7, this section has beendivided into two parts: synthetic and real-world networks.

D.1 Synthetic Networks

For the first set of experiments, we sample d-regular representation graphs that satisfy Assumptions 2 and3. Figure 1 in the paper compares the performance of Gen-FairSC with FairSc (Kleindessner et al.,2019b) and spectral clustering without fairness (SC) for various values of N by fixing K = 5 and d = N/10.Recall that N is the number of nodes, K is the number of clusters, and d is the common degree of thenodes in the sampled d-regular representation graphs.

Figure 4a also shows accuracy as a function of N . However, as opposed to the previous case, here we setd = 40 instead of using d = 0.1N . Thus, as N increases, the representation graph becomes increasinglysparse. As before, K = 5. We run FairSC with P = 0.1N protected groups that are discovered using theobserved adjacency matrix as explained in Section 7. We also run the approximate variant of Gen-FairSCthat was used for real-data experiments in Section 7. To do so, we find the best rank r approximation ofthe representation graph and use it instead of the original R. Figure 4a uses r = 0.1N . It reports the meanand standard deviation in accuracy across 10 independent runs in each case. Recall that the ground-truthclusters are fair by construction. Hence, high accuracy implies high fairness. Gen-FairSC outperforms allother algorithms and Gen-FairSC-(approx.) performs better than FairSC and SC.

10 100 200 400 500 600 700 800 900# Protected groups (P) for FairSC

Rank of Representation Graph (r) for Gen-FairSC (approx.)

0.00

0.25

0.50

0.75

1.00

1.25

1.50

Aver

age

Bala

nce/

Ratio

-Cut

1e 3 #Nodes: 1000, #Clusters: 4, #Groups: 5

SC Fair-SC Gen-FairSC (approx.)

(a)



0

1

2

3

4

5

Aver

age

Bala

nce/

Ratio

-Cut



(b)



0

2

4

6

8

Aver

age

Bala

nce/

Ratio

-Cut



(c)



0.0

0.5

1.0

1.5

2.0

2.5

3.0

Aver

age

Bala

nce/

Ratio

-Cut



(d)



0

1

2

3

4

5

Aver

age

Bala

nce/

Ratio

-Cut



(e)



0.0

0.5

1.0

1.5

2.0

Aver

age

Bala

nce/

Ratio

-Cut



(f)

Figure 5: Comparison between Gen-FairSC-(approx.), FairSC, and spectral clustering without fairness(SC) using representation graphs generated from a SBM. In all cases, rank r can be chosen for Gen-FairSC-(approx.) to get clusters with high balance and low ratio-cut. However, FairSC yields poorperformance across various values of P . The difference in performance is especially significant when K islarge. These plots use pin = 0.8, pout = 0.2, and five protected groups in the SBM used for sampling therepresentation graphs (see Section 7.1).

Figure 4b plots accuracy against the number of ground-truth clusters K. It uses N = 2000, d = 40,P = 0.1N (for FairSC), and r = 0.1N (for Gen-FairSC (approx.)). Crucially, the performance ofGen-FairSC does not change as K increases, whereas the performance of all other algorithm degradessubstantially. It appears in Figure 4a that even normal spectral clustering will return fair clusters for alarge enough graph. However, Figure 4b shows that this is not true if the number of clusters also increases,which is more common in practice.

Figure 4c shows the variation in performance with the degree d of the representation graph. We usedN = 2000, K = 5, P = 0.1N (for FairSC), and r = 0.1N (for Gen-FairSC (approx.)) to generate thisfigure. Once again, both Gen-FairSC and its approximate variant outperform other algorithms and arealso robust to changes in degree of the representation graph. While the performance of SC also appears tobe robust to changes in d, FairSC has poor performance for higher d.It may be tempting to think that FairSC may perform well with a more carefully cho-

sen value of P , the number of protected groups. Figure 6 shows that this is not the case.

100 200 300 400 500 600 700Rank (for Gen-FairSC (approx.))

#Groups (for FairSC)

#Nodes: 800, #Degree: 40, #Clusters: 8

0.2

0.4

0.6

0.8

1.0

Accu

racy

Accuracy vs #Groups/Rank

SCGen-FairSC


Figure 6: Accuracy vs number of protected groupsP used in FairSC and rank r used in Gen-FairSC-(approx.) for d-regular representation graphs.

It plots the performance of FairSC as a functionof the number of protected groups P . Also shownis the performance of the approximate variant ofGen-FairSC for various values of rank r used in ap-proximating the representation graph. As expected,the accuracy increases with r as Gen-FairSC-(approx.) gets to use a better approximation forR.The next set of experiments sample representa-

tion graphs from an SBM as described in Section 7.1.As noted before, representation graphs sampled thisway are not d-regular. We always use the approxi-mate variant of Gen-FairSC in these experimentsas a sampled R may violate Assumption 1 whichis necessary to ensure that eq. (6) has a solution.Figure 2 in the paper shows the variation in the per-formance of Gen-FairSC-(approx.) and FairSCwith P and r for a fixed value of N and K (P andr have the same meaning as above. See Section 7for more details about the notation and the sam-pling process). Figure 5 compares Gen-FairSC-(approx.) and FairSC for more combinations ofN and K. Every sub-plot shows the ratio of average balance ρ to the ratio-cut objective defined in Section2 on y-axis. We want to find clusters with high balance and low ratio-cut. Thus, higher values on y-axisare more desirable.

D.2 Real-World Network

We use the FAO trade network for the next set of experiments. See Section 7.2 for details about theconstruction of this network. Figure 3 fixes K = 2 and compares Gen-FairSC-(approx.) with FairSCfor different values of r and P used by Gen-FairSC-(approx.) and FairSC respectively. Figure 7 makesthe same comparison for other values of K. On the y-axis is the ratio of average balance ρ to the value ofratio-cut. The x-axis corresponds to different values of P and r. As before, higher values on y-axis aredesirable.

E Individual vs Statistical Fairness: A Toy Example

In this section, we demonstrate a toy example to distinguish between statistical and individual fairness.Recall that a commonly used notion of statistical fairness, called proportional representation (Chierichettiet al., 2017; Kleindessner et al., 2019b), assumes that the nodes belong to protected groups P1, P2, . . . ,PP and requires,

|Ck ∩ Pp||Ck|

=|Pp|N

, for all p ∈ [P ], k ∈ [K].

The algorithm FairSC proposed in Kleindessner et al. (2019b) tries to achieve statistical fairness. Incontrast, our algorithm Gen-FairSC supports the fairness notion specified in Definition 3.1, which includesstatistical fairness as a special case (see Section 3), but can more generally enforce individual fairness.

Consider the representation graph specified in Figure 8. All nodes have a self-loop associated with them,which has not been shown for clarity. In this example, N = 24, K = 2, and every node is connected tod = 6 nodes (including the self-loop). The color of nodes indicates their ground-truth cluster membership.It is easy to verify that the ground-truth clusters are fair with respect to this representation graph as per

5 20 40 60 80 100 120 140 160 180 200# Protected groups (P) for FairSC


0.0

0.2

0.4

0.6

0.8

1.0

Aver

age

Bala

nce/

Ratio

-Cut

1e 1 Number of Clusters: 2

Fair-SC Gen-FairSC (approx.)

(a)



0.00

0.25

0.50

0.75

1.00

1.25

1.50

Aver

age

Bala

nce/

Ratio

-Cut



(b)



0

1

2

3

4

Aver

age

Bala

nce/

Ratio

-Cut



(c)



0.00

0.25

0.50

0.75

1.00

1.25

1.50

Aver

age

Bala

nce/

Ratio

-Cut



(d)



0

2

4

6

8

Aver

age

Bala

nce/

Ratio

-Cut



(e)



0

1

2

3

4

Aver

age

Bala

nce/

Ratio

-Cut



(f)



0.0

0.5

1.0

1.5

Aver

age

Bala

nce/

Ratio

-Cut



(g)



0.0

0.2

0.4

0.6

0.8

1.0

Aver

age

Bala

nce/

Ratio

-Cut



(h)

Figure 7: Comparing the performance of Gen-FairSC-(approx.) and FairSC on the FAO trade networkfor different values of K. Gen-FairSC-(approx.) outperforms FairSC.

Definition 3.1 as each node has three neighbors in both red and blue clusters. Our consistency result showsthat Gen-FairSC recovers the fair ground-truth clusters for such d-regular graphs with high probability.

On the other hand, to apply FairSC to this problem, we first need to cluster nodes in the representationgraph to find the protected groups P1, P2, . . . , PP . A natural choice is to have two protected groups P1 ={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} and P2 = {13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24}. However, clusteringnodes based on these protected groups can produce clusters where, for instance, all nodes on the yellowbackground are in C1, and all nodes on the green background are in C2. It is easy to verify that this clusteringsatisfies the statistical fairness criterion given above. However, it is very unfair from the perspective of

2 3

4

56

1

8 9

10

12

7

11

14 15

16

1718

13

20 21

22

24

19

23

Figure 8: A small 24-node representation graph. Circles represent nodes and lines represent edges. Thecolor of circles indicate the ground-truth cluster membership. The rectangles enclose nodes that can bepossibly clustered together by FairSC. See Appendix E for a description of why the clusters recovered byFairSC can be statistically fair but highly unfair from each individual’s perspective.

each individual. For example, only one of the six neighbors of node 1 is in C2 despite the equal size of boththe clusters. Thus, node 1 does not have enough representation in C2. A similar argument can be made forevery node in this graph.

This example illustrates how a clustering can be highly unfair from each individual’s perspective despitebeing statistically fair. Gen-FairSC can recover the individually fair clusters but FairSC cannot.

protecting individual interests across clusters: spectral

Documents