1 a deterministic equivalent for the capacity analysis …jack/multicell_journal716.pdf3 rates as a...

44
1 A deterministic equivalent for the capacity analysis of correlated multi-user MIMO channels Romain Couillet ,, M´ erouane Debbah , Jack Silverstein Abstract This paper provides the analysis of capacity expressions for multi-user and multi-cell wireless communication schemes when the transmitters and the receivers have a large number of correlated antennas. Our main contribution mathematically translates into a deterministic equivalent of the Shannon transform of a class of large dimensional random matrices; the latter are sums of Gram matrices with separable variance profiles, the covariance matrices of which possibly have asymptotically large eigenvalues. On the applicative side, this class of large matrices is used in this contribution to model (i) multi-antenna multiple access (MAC) and broadcast channels (BC) with transmit and receive channel correlation, (ii) multiple-input multiple-output (MIMO) communications with inter-cell interference and channel correlation both at the base stations and at the receivers. The theoretical capacity formulas obtained for these models extend the classical results on multi-user MIMO capacities in independent and identically distributed (i.i.d.) Gaussian channels to the more realistic Gaussian channels with separable variance profile. On an information theoretical viewpoint, this article provides: in scenario (i), an asymptotic description of the MAC and BC rate regions as a function only of the transmit and receive correlation matrices (thus independently of the random channel realizations); an asymptotic expression of the capacity-maximizing covariance matrices in the uplink MAC and downlink BC; a water-filling algorithm which, upon convergence, is proved to converge to the capacity-achieving power allocation at the transmitters both in MAC and BC. In scenario (ii), the article provides: an expression of the single-user decoding capacity and minimum mean square error (MMSE) decoding capacity when interference is treated as Gaussian noise with a known variance profile; for single-user decoding, the capacity-achieving antenna power allocation policy at the transmitter. I. I NTRODUCTION When mobile networks were some time ago expected to run out of power and frequency resources while being simultaneously subject to an increasing demand of achievable data rates, Foschini [6] and Telatar [7] introduced the notion of multiple input multiple output (MIMO) systems and predicted a growth of capacity performance of min(N,n) for a communication between an n-antenna transmitter and an N -antenna receiver when the propagation ST-Ericsson, 505 Route des Lucioles, 06560 Sophia Antipolis Alcatel-Lucent Chair on Flexible Radio, Sup´ elec, 3 rue Joliot-Curie, 91192 Gif sur Yvette Department of Mathematics, North Carolina State University, Raleigh, North Carolina 27695-8205

Upload: doandieu

Post on 17-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

1

A deterministic equivalent for the capacity

analysis of correlated multi-user MIMO

channelsRomain Couillet†,‡, Merouane Debbah‡, Jack Silverstein⊥

Abstract

This paper provides the analysis of capacity expressions for multi-user and multi-cell wireless communication

schemes when the transmitters and the receivers have a largenumber of correlated antennas. Our main contribution

mathematically translates into a deterministic equivalent of the Shannon transform of a class of large dimensional

random matrices; the latter are sums of Gram matrices with separable variance profiles, the covariance matrices of

which possibly have asymptotically large eigenvalues. On the applicative side, this class of large matrices is used in

this contribution to model (i) multi-antenna multiple access (MAC) and broadcast channels (BC) with transmit and

receive channel correlation, (ii) multiple-input multiple-output (MIMO) communications with inter-cell interference

and channel correlation both at the base stations and at the receivers. The theoretical capacity formulas obtained for

these models extend the classical results on multi-user MIMO capacities in independent and identically distributed

(i.i.d.) Gaussian channels to the more realistic Gaussian channels with separable variance profile. On an information

theoretical viewpoint, this article provides: in scenario(i), an asymptotic description of the MAC and BC rate regionsas

a function only of the transmit and receive correlation matrices (thus independently of the random channel realizations);

an asymptotic expression of the capacity-maximizing covariance matrices in the uplink MAC and downlink BC; a

water-filling algorithm which, upon convergence, is provedto converge to the capacity-achieving power allocation at

the transmitters both in MAC and BC. In scenario (ii), the article provides: an expression of the single-user decoding

capacity and minimum mean square error (MMSE) decoding capacity when interference is treated as Gaussian noise

with a known variance profile; for single-user decoding, thecapacity-achieving antenna power allocation policy at

the transmitter.

I. I NTRODUCTION

When mobile networks were some time ago expected to run out ofpower and frequency resources while being

simultaneously subject to an increasing demand of achievable data rates, Foschini [6] and Telatar [7] introduced

the notion of multiple input multiple output (MIMO) systemsand predicted a growth of capacity performance of

min(N, n) for a communication between ann-antenna transmitter and anN -antenna receiver when the propagation

†ST-Ericsson, 505 Route des Lucioles, 06560 Sophia Antipolis‡Alcatel-Lucent Chair on Flexible Radio, Supelec, 3 rue Joliot-Curie, 91192 Gif sur Yvette⊥Department of Mathematics, North Carolina State University, Raleigh, North Carolina 27695-8205

2

channel matrix model is formed of independent and identically (i.i.d.) Gaussian entries. However, in practical

systems, this tremendous multiplexing gain can only be provided for large signal-to-interference plus noise ratios

(SINR) and for uncorrelated transmit and receive antenna arrays at both communication sides. In present wireless

mobile networks, the scarcity of available frequency resources has led to a widespread incentive for MIMO

communications. Due to space limitations, mobile designers now embed more and more antennas in small devices,

which inevitably spawns non-negligible correlation patterns at the antenna arrays and thus non-negligible effects on

the achievable transmission rates. Since MIMO systems comealong with a tremendous increase in signal processing

requirements (which naturally imply a large non-linear increase in power consumption), both base station and mobile

manufacturers need to accurately assess the exact cost of bit rate increases on finite size devices, prone to experience

strong transmit and receive correlation. Our intention is to evaluate the antenna efficiency, i.e. the mean per-antenna

achievable rate, for different communication models detailed in the following.

Multi-cell and multi-user systems are among the scenarios of main interest to cellular service providers. The scope

of the present study (which, we will see, can be extended to a larger set of mobile communications scenarios) lies

in the following two wireless communication systems, whichthe authors consider of most valuable interest,

1) the multiple access channels (MAC) in whichK transmitters, hereafter assimilated to mobile terminal users,

transmit information to a unique receiver, hereafter referred to as the base station; and the dual broadcast

channels (BC) in which the base station multi-casts information to theK users. While the major scientific

breakthroughs in multi-antenna broadcast channels are quite recent, see e.g. [20], the practical applications

are foreseen to arise in a near future with the so-called multi-user MIMO techniques to be used in future

long term evolution standards, e.g. [1].

2) the single-user decoding and minimum mean square error (MMSE) decoding [8] in multi-cell scenarios. In

most current mobile communication systems, the wireless networks are composed of multiple overlapping

cells, controlled by non-cooperating base stations. In these conditions, the achievable rates for every user in

a cell, assuming no intra-cell interference, corresponds to the capacity of the single-user decoding scheme in

which interfering signals are treated as Gaussian noise with a known correlation pattern. However, single-user

decoders are not linear decoders and are often replaced in practical applications by the simpler linear MMSE

decoders; these decoders maximize the signal-to-interference plus noise ratio (SINR) at the receiver.

The achievable rate region of MIMO MAC and BC channels for generic channels have been known since the

successive contributions [20]-[21], who established an important duality link between MAC rate regions and BC

rate regions in vector channels, both in single antenna and MIMO channels. These important contributions provide

the general descriptions of the rate regions under fixed wireless channels, without any underlying transmission

model. As a consequence, the exact expressions obtained areoften intractable and generally do not provide any

insight of the various channel parameters involved on the resulting achievable data rate. The mathematical field of

large random matrices allows one to circumvent this issue byproviding closed-form approximations of achievable

3

rates as a function of the most relevant channel parameters only1. The most notable result in the line of the present

study for random channels is due to Tulino [2], which provides an asymptotic capacity expression of point-to-point

MIMO systems when the channel random matrix is composed of i.i.d. Gaussian entries. This result naturally extends

to multiple users by dividing the MIMO channel matrix intoK sub-matrices; this therefore allows one to obtain

a description of the MAC and BC rate regions for the uncorrelated Gaussian channels, which in shown in [2] to

depend only (i) on the signal to noise ratio (SNR) at the receivers and (ii) on the ratioc = N/n between the

numberN of transmit antennas at the base station and the numbern of receive antennas at all receivers. Tulino

also provides an expression of the capacity-achieving power allocation policy at the base station. In [24], Ulukus

derives the capacity-achieving power allocation policy for a finite number of antennas at all transmit/receive devices

in the case ofK users whose channelsHk, 1 ≤ k ≤ K, are modelled as Kronecker channels, i.e. Gaussian with

separable variance profile,Hk = R12

k XkT12

k , whereRk, Tk are Hermitian nonnegative definite2 andXk is random

with i.i.d. Gaussian entries; however, Ulukus (i) does not provide a theoretical large dimensional expression of the

resulting capacity, (ii) makes the strong assumption that all Rk matrices are equal. It is in fact rather straightforward

to observe that, under the assumption that all allRk share the same eigenspace, the result from [2] is extensible

to the Kronecker channel model, see e.g. [3]. When theRk matrices have no trivial relationship, the problem

is more complex and requires different mathematical tools.Those tools allow us in the present work to obtain a

deterministic equivalent of the per-antenna rate regions associated to the channelsHk: this is our main result, which

we provide in Theorem 2. In fact, it is important to note that the final formula of Theorem 2 is already found and

used by Chen, Equation (32) in [11]; however, the latter is provided without any proof, nor any hypotheses on the

considered matrices, and merely stems from the previous Equation (6) which is in valid when allRk matrices have

the same eigenspace, but which is incorrect in the general case, as will become clear in the following. Chen also

provides the iterative water-filling algorithm which we will also obtain in the course of this paper (see Table I);

however, the convergence of this algorithm to the correct capacity, which we will prove, is not provided in [11].

Regarding multi-cell networks, to the authors’ knowledge,few contributions treat simultaneously the problem of

multi-cell interference in more structured channel modelsthan i.i.d. Gaussian channels. In [13], the authors carry out

the performance analysis of TDMA-based networks with inter-cell interference. In [14], a random matrix approach

is used to study large CDMA-based networks with inter-cell interference. In our particular MIMO context, it is

important to mention the work of Moustakas [10] who conjectures an analytic solution to the single-user decoding

problem with channel correlation and interference with known variance profile, using the replica method [12].3

Practical channel models, as recalled earlier, prove more challenging than mere i.i.d. Gaussian channels. Small

distances between antennas embedded in the wireless devices as well as solid angles of transmission and reception

of signal energy tend to correlate (potentially strongly) the emitted and received signals. A largely spread channel

1such as the transmit/receive covariance matrices, the deterministic line of sight components etc.

2R12k

andT12k

are then defined as their unique nonnegative definite square root.

3we use here the term ‘conjecture’ instead of ‘demonstrate’ since the exactness of the replica method has not yet been mathematically proven.

4

model, which naturally unfolds from the knowledge of a correlation pattern at the transmitter and at the receiver

[35] is the so-called Kronecker model, which mathematically translates into random matrices with i.i.d. Gaussian

entries with separable correlation profile, as previously recalled. This model is of particular interest when no line

of sight component is present in the channel and when a sufficiently large number of scatterers is found in the

communication medium. This is the model we will consider in the following study. However note that, in their

substantial contributions [26]-[27], Loubaton et al. provide a deterministic equivalent of capacity of a point-to-point

MIMO channel [26] and its corresponding capacity-achieving input covariance matrix [27] when the channel matrix

H is modelled as Ricean, i.e.H = A + X whereA is some deterministic matrix, standing for the line-of-sight

component, andX is Gaussian with general variance profile (E[|Xij|2] = σij ). Of particular practical interest is

also the theoretical work of Tse [9] on MIMO point-to-point capacity in both uncorrelated and correlated channels,

which are validated by ray-tracing simulations.

The main contribution of this paper summarizes into two major mathematical theorems, contributing to the field

of random matrix theory, and an important water-filling algorithm enabling to describe the boundaries of the MAC

and BC rate regions. On a purely mathematical viewpoint, we provide a deterministic equivalent of the Stieltjes

transform of the sumBN of K Gaussian matrices with separable variance profileR12

k XkTkXH

kR12

k , Rk, Tk

nonnegative Hermitian,Xk i.i.d. and unitary invariant, plus a deterministic nonnegative Hermitian matrixS,

BN =

K∑

k=1

R12

k XkTkXH

kR12

k + S (1)

This result extends the recent results from Taricco [5] and Honig [3] in which, respectively, allRk matrices are

equal or share the same eigenspace. The main consequence of letting theRk matrices unconstrained is that no

limit (for large N ) eigenvalue distribution ofBN is available, even when theRk and Tk matrices have a limit

eigenvalue distribution. Therefore, classical results from random matrix theory are not usable here. The second major

result lies in a deterministic equivalent of the Shannon transform ofBN , obtained by integration of the deterministic

equivalent of the Stieltjes transform. This result extends[2] to multiple channels with separable variance profile. For

both deterministic equivalents of the Stieltjes and Shannon transforms, and contrary to most previous contributions

in similar articles, we do not restrict here the eigenvaluesof the Rk andTk matrices to be uniformly (over the

matrix sizes) bounded. In the present contribution, those eigenvalues can theoretically grow at a restricted rate, to

be defined later. In practice, this allows to study a wider scope of channel models than in the previously mentioned

studies; in particular, this study covers the usual case when theRk andTk matrices are constrained bytr(Rk) = N ,

tr(Tk) = nk (N andnk the respective sizes ofRk andTk).

The remainder of this paper is structured as follows: in Section II, we provide a quick summary of our results

and discuss the extent of their applicability. In Section III, we provide the two main mathematical theorems needed

for practical applications in the subsequent sections. Thecomplete proofs of both theorems are provided in the

appendix. In Section IV, the rate region of MAC and BC channels and the capacity of single-user decoding and

MMSE decoding with inter-cell interference are studied. Inthis section, we will introduce our third main result:

an iterative water-filling algorithm to describe the boundary of the MAC and BC rate regions. In Section V, we

5

provide simulation results of the previously derived theoretical formulas and discuss the advantages and limitations

of the deterministic equivalents. Finally, in Section VI, we give our conclusions.

Notation: In the following, boldface lower-case symbols represent vectors, capital boldface characters denote

matrices (IN is theN ×N identity matrix).Xij denotes the(i, j) entry ofX. The Hermitian transpose is denoted

(·)H. The operatorstrX, |X| and‖X‖ represent the trace, determinant and spectral norm of matrix X, respectively.

The symbolE[·] denotes expectation. The notationFY stands for the empirical distribution of the eigenvalues of

the Hermitian matrixY. The function(x)+ equalsmax(x, 0) for real x. For F , G two distribution functions, we

denoteF ⇒ G the vague convergence ofF to G.

II. SCOPE ANDSUMMARY OF MAIN RESULTS

In this section, we summarize the main results of this paper and explain how they naturally help to study, in the

present multi-cell multi-user framework, the effects of channel correlation on the antennarate efficiency, which we

define as the mean achievable rate provided by each transmit/receive antenna.

A. General Model

Consider a set ofK wireless entities which purpose is to communicate with another wireless device, either

in downlink or in uplink, at the ‘best’ possible rate performance4. For clarity and without generality restriction,

consider here the uplink scenario. DenoteHk the channel matrix model between thekth transmitterk and the

receiver. Then, assuming a large number of scatterers in themedium between both entities and no line-of-sight

vision, it is often accurate to consider thatHk is modelled as Kronecker,

Hk = R12

k XkT12

k (2)

where, as already mentioned, theN×N matrixR12

k and thenk×nk matrixT12

k are respectively the only nonnegative

square roots of the Hermitian nonnegative matricesRk and Tk, andXk is a realization of a random Gaussian

matrix5. The matricesTk andRk in this scenario model the correlation present in signals transmitted by userk

and signals received at the receiver. It is important to stress out that those correlation patterns emerge both from

the one-to-one antenna spacings on the volume limited devices and from the solid angles of useful transmitted

and received energy. Without this second factor, it would make sense that allRk are equals, which was claimed

for instance in [24]. However, this would mean that signals are received isotropically at the receiver, which turns

out to be often too strong an assumption to characterize practical communication channels. This being said, the

only sensible restriction that one can make on the matricesRk andTk is for their diagonal entries to be less or

equal to one, and then for their spectral norms to be less thanN andnk respectively. We will see that our results

require different hypothesis, which nonetheless allow to treat some channel models with strong correlations. The

4we purposely are cryptic on the term ‘best’ for one might consider very different performance criteria, such as achievable sum-rate, max-min

rate etc.

5Gaussian matrices often refer to random matrices with i.i.d. Gaussian entries.

6

assumption often considered, e.g. in [26]-[27], is that thespectral norms ofRk andTk, for all k, are uniformly

bounded (over the matrix sizes), which means that only low correlation patterns can be studied; in the present work,

we will allow a possibly large portion of the eigenvalues ofRk, Tk to grow at a potentially high rate, along with

growing N , nk. More precisely, we provide a sufficient (but not necessary)condition on the number and growing

rate of the eigenvalues ofRk andTk which ensures the asymptotic accuracy of the deterministicequivalents under

study. Under this condition, a wider scope of channel correlation models can be studied than in most previous

contributions, including the classical case whentr(Rk) = N , tr(Tk) = nk.

As will be evidenced by the information theoretic applications of Sections IV-A and IV-B, most multi-cell or

multi-user capacity performance rely more or less directlyon the so-called Stieltjes transformmN (z) of matrices

BS

N of the type

BS

N =∑

k∈S

R12

k XkTkXH

kR12

k (3)

for some given subsetS of {1, . . . , K}.

The Stieltjes transformmN of the N × N Hermitian matrixBS

N is defined overC \ R+ as

mN (z) =

∫1

λ − zdFBS

N (λ) =1

Ntr(BS

N − zIN

)−1(4)

Of importance is the Shannon transform, denotedV(x), of BS

N , which we define, forx > 0, as

V(x) =1

Nlog det

(IN +

1

xBS

N

)(5)

=

∫ +∞

0

log(1 + λx−1)dFBS

N (λ) (6)

=

∫ +∞

x

(1

w− mN (−w)

)dw (7)

B. Main results

The main results of this work related to random matrix theorycome as follows,

• we first present a theorem, namely Theorem 1, which provides adeterministic equivalent of the Stieltjes

transform of generalizedBS

N -like matrices, under the assumption that, with growingN and nk (with non

trivial ratio 0 < c = N/n ≪ N ) the sequences{FTk}nkand {FRk}N are tight. This is, we provide an

approximation ofmN(z) which does not depend on the realization of theXk matrices. The tight sequence

assumption allows some degenerated cases of very strong correlation inRk andTk, which exhibit an eigenvalue

of order of magnitudeN .6

• we then provide in Theorem 2 a deterministic equivalentV(0)(x) of the Shannon transformV(x) of BS

N . For

this theorem, the assumptions on theRk and Tk matrices are more constraining, since we need to assume

here that the maximum numberrN of the eigenvalues ofTk, Rk which grow unbounded and their maximum

6remark, for instance, that the scenario in which the device volume is fixed while the number of antennasN grows inside this volume, is

prone to exhibit such degenerated behaviour. This case can therefore be treated here.

7

valuebN satisfy log(1 + b2Nβ/x)rN = o(N) for some constantβ to be defined in Theorem 2. However, this

assumption is largely less restrictive than the hypothesisof uniformly (with respect toN ) bounded spectral

norms ofTk andRk. For instance, we can theoretically allow

– the matricesTk andRk to verify tr(Rk) = O(N), tr(Tk) = O(N), which is of practical interest here.

– all but a finite number of the eigenvalues ofTk, Rk to be uniformly bounded, the largest of them growing

like eN . For our specific study, this case is less relevant though.

The major practical interest of Theorems 1 and 2 lies in the possibility to analyze capacity expressions, no longer

as stochastic variables depending on the matricesXk but as approximations of deterministic quantities. The study of

those quantities are in general simpler than the study of thestochastic expressions, even if the deterministic results

are actually solutions of involved implicit equations (seeSection III). In particular, remember that our problematic

introduced in Section I is to study the trade-off ‘capacity gain’ versus ‘cost’ of additional transmit/receive antennas.

For this reason, the typical figures of performance sought for are the per-(transmit or receive) antenna normalized

capacity, sum rate or rate region, i.e. the mean efficiency ofan antenna of the transmit/receive array. Those are

again related to the Stieltjes and Shannon transform ofBS

N -like matrices. However, we will not provide in this

study asymptotic sum-capacity expressions, i.e.N times the Shannon transform, for which asymptotic accuracyof

the deterministic equivalents cannot be verified; on this topic, the reader is strongly advised to refer to [27].

In such practical applications as the rate region of BC, the Shannon transform expressions (5) are needed to

calculate the corner points in the rate region; because of MAC-BC duality, to determine the BC rate region, one

needs to treat the dual MAC uplink problem, see Section IV-A;in order to decouple the influence of antenna

correlation and transmit covariance matrices at the transmitters, the correlation matrixRk at thekth BC receiver

(and then thekth MAC transmitter) is replaced by the productR12

k PkR12

k , with Rk translating the channel correlation

at the transmitter andPk the transmit signal covariance matrix. Finding out the matricesPk which maximize (5)

is a very difficult problem for finiteN , which, up to the authors’ knowledge, has not been solved, unless all the

Tk matrices are equal [24]. In Section IV-A, we determine the matrices P⋆k which maximize the deterministic

equivalentV(0) of V in (5). The following points are of noticeable importance,

• the eigenspaces of the capacity-maximizingP⋆k matrices (in the deterministic equivalent) coincide respectively

with the eigenspaces of the correlation matricesRk at the receivers.

• the eigenvalues of theP⋆k matrices are solution of a classical optimization problem.They are given by a

water-filling process.

• we provide an iterative water-filling algorithm for the eigenvalues ofPk which, upon convergence, are proved

to converge to the optimal eigenvalues ofP⋆k, for all k. This is to say, we do not prove that the algorithm

converges (though after intensive simulations, it always converged) but we state and can prove that, if it does,

then it converges towards the sought solution.

It is important to understand here that, contrary to [24], the above result does not imply that the capacity-

maximizing power allocation in the finiteN regime consists in aligning the eigenvectors of thePk matrices to

8

those of the transmit correlation matrices; we show that this strategy does optimize the deterministic equivalent

though.

III. M ATHEMATICAL PRELIMINARIES

In this section, we first introduce Theorem 1, which providesa deterministic equivalent for the Stieltjes transform

of matricesBN of the type (1). The underlying assumptions of this theorem are made as large as possible for

mathematical completeness. The Shannon transform ofBN is then provided in Theorem 2, under tighter assumptions

on the matricesXk, Rk andTk.

Theorem 1:Let K ∈ N∗ be some fixed positive integer. For someN ∈ N∗, let

BN =

K∑

k=1

R12

k XkTkXH

kR12

k + S (8)

be anN × N matrix with the following hypothesis for allk ∈ {1, . . . , K},

1) Xk =(

1√nk

Xkij

)is N × nk with the Xk

ij identically distributed complex for allN , i, j, independent for

each fixedN , andE|Xk11 − EXk

11|2 = 1,

2) R12

k is the N × N Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix

Rk,

3) Tk = diag(τ1, . . . , τnk) is nk × nk, nk ∈ N∗, diagonal withτi ≥ 0,

4) The sequences{FTk}nk≥1 and {FRk}N≥1 are tight, i.e. for allε > 0, there existsM0 > 0 such that

M > M0 implies FTk([M,∞)) < ε andFRk([M,∞)) < ε for all nk, N ,

5) S is N × N Hermitian positive definite,

6) There existsb > a > 0 for which

a ≤ lim infN

ck ≤ lim supN

cN ≤ b (9)

with ck = N/nk.

Also denote, forz ∈ C \ R+, mN (z) =∫

(λ − z)−1dFBN (λ), the Stieltjes transform ofBN . Then, as allN and

nk grow large, with (non necessarily fixed) ratiock,

mN (z) − m(0)N (z)

a.s.−→ 0 (10)

where

m(0)N (z) =

1

Ntr

(S +

K∑

k=1

∫τkdFTk(τk)

1 + ckτkek(z)Rk − zIN

)−1

(11)

and the set of functions{ei(z)}, i ∈ {1, . . . , K}, form the unique solution to theK equations

ei(z) =1

NtrRi

(S +

K∑

k=1

∫τkdFTk(τk)

1 + ckτkek(z)Rk − zIN

)−1

(12)

such thatsgn(ℑ[ei(z)]) = sgn(ℑ[z]).

9

Moreover, for anyε > 0, the convergence of Equation (10) is uniform over any regionof C bounded by a contour

interior to

C \ ({z : |z| ≤ ε} ∪ {z = x + iv : x > 0, |v| ≤ ε})

For allN , the functionm(0)N is the Stieltjes transform of a distributionF 0

N . DenotingFBN the empirical eigenvalue

distribution ofBN , we finally have

FBN − F 0N ⇒ 0 (13)

weakly asN → ∞.

Proof: The proof of Theorem 1 is deferred to Appendix A.

Looser hypothesis will be used in the applications of Theorem 1 provided in Section IV. We will specifically

need the corollary hereafter,

Corollary 1: Let K ∈ N be some positive integer. For someN ∈ N∗, let

BN =

K∑

k=1

R12

k XkTkXH

kR12

k (14)

be anN × N matrix with the following hypothesis for allk ∈ {1, . . . , K},

1) Xk =(

1√nk

Xkij

)is N × nk with joint distribution invariant to right unitary product, where theXk

ij are

independent and identically distributed for eachi, j, N , with finite fourth order moment.

2) R12

k is the N × N Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix

Rk,

3) Tk is annk × nk nonnegative definite Hermitian matrix,

4) The sequences{FTk}nk≥1 and{FRk}N≥1 are tight.

5) There existsb > a > 0 for which

a ≤ lim infN

ck ≤ lim supN

cN ≤ b (15)

with ck = N/nk.

Also denote, forx < 0, mN (x) = 1N (BN − xIN )−1. Then, as allN andnk grow large (whileK is fixed), with

ratio ck

mN(x) − m(0)N (x)

a.s.−→ 0 (16)

where

m(0)N (x) =

1

Ntr

(K∑

k=1

∫τkdFTk(τk)

1 + ckτkek(z)Rk − xIN

)−1

(17)

and the set of functions{ei(z)}, i ∈ {1, . . . , K}, form the unique solution to theK equations

ei(z) =1

NtrRi

(K∑

k=1

∫τkdFTk(τk)

1 + ckτkek(z)Rk − zIN

)−1

(18)

such thatsgn(ℑ[ei(z)]) = sgn(ℑ[z]).

10

Proof: Since theXk ’s are invariant to right-unitary product, the joint distribution of XkU coincides that of

Xk, for U anynk ×nk unitary matrix. Therefore,XkTkXH

k in Theorem 1 can be substituted byXk(UTkUH)XH

k

without compromising the final result. As a consequence, theTk ’s can be taken non diagonal nonnegative definite

Hermitian and the result of Theorem 1 still holds true.

The deterministic equivalent of the Stieltjes transformmN of BN is then extended to a deterministic equivalent

of the Shannon transform ofBN in the following result,

Theorem 2:Let x be some strictly positive real number. LetBN be a random Hermitian matrix as defined in

Corollary 1 with the following additional assumptions

1) there existsα > 0 and a sequencerN , such that, for allN ,

max1≤k≤K

max(λTk

rN +1, λRk

rN +1) ≤ α (19)

whereλX1 ≥ . . . ≥ λX

N denote the ordered eigenvalues of theN × N matrix X.

2) denotingbN an upper-bound on the spectral norm of theTk ’s andRk ’s, k ∈ {1, . . . , K}, andβ some real

such thatβ > K(b/a)(1 +√

a)2, thenaN = b2Nβ satisfies

rN log(1 + aN/x) = o(N) (20)

Then, for largeN , nk, the Shannon transformV(x) =∫

log(1 + λ)dFBN (λ) of BN , satisfies

V(x) − V(0)(x)

a.s.−→ 0 (21)

where

V(0)(x) =

1

Nlog det

(IN +

1

x

K∑

k=1

Rk

∫τk

1 + ckek(−x)τkdFTk(τk)

)

+

K∑

k=1

1

ck

∫log (1 + ckek(−x)τk) dFTk(τk)

+ x · m(0)N (−x) − 1 (22)

Proof: The proof of Theorem 2 is provided in Appendix B.

Remark 1:Remind that, since Gaussian matrices are right-unitary invariant with entries of finite fourth moment,

the special case ofXk matrices with Gaussian i.i.d. entries fits the requirementsof Corollary 1 and Theorem 2. In

the practical applications of the following section, we shall always consider this hypothesis.

Remark 2:Note that this last result is consistent with the work of Telatar [7] and Tulino [2] whenK = 1 and

the transmission channels are respectively Gaussian with no correlation or with separable variance profile. The

latter is easy to observe from the form of Equation (206) which, up to a Stieltjes-η variable change, is exactly the

expression in [2] whenK = 1 with separable variance profile.

11

H1

H2 H3

HK

User1

User2

User3

UserK

Base station

Fig. 1. Downlink scenario in multi-user broadcast channel

IV. A PPLICATIONS

In this section, we provide applications of Theorems 1 and 2 to two studies in the field of wireless communications.

First, in Section IV-A, we derive an expression of the rate region of multi-antenna multiple access and broadcast

channels for given correlation matrices at the transmit andreceive devices. An iterative power allocation algorithm

is then introduced which is proved, upon convergence, to maximize the deterministic equivalent of the rate region

corner points. Then, in Section IV-B, we provide an analytical expression of the capacity of the single user and

MMSE decoders in wireless MIMO networks with inter-cell interference. For the single user decoder, a derivation

of the optimal power allocation for the sum rate maximization is also provided.

A. Rate Region of Broadcast Channels

1) System Model:Consider a wireless multi-user channel withK ≥ 1 users indexed from1 to K, controlled

by a single base station. Userk is equipped withnk antennas while the base station is equipped withN antennas.

We additionally denoteck = N/nk. This situation is depicted in Figure 1.

Even if we will longly discuss the MAC channel between theK users and the base station, our prior objective

is to characterize the BC channel between the base station and the users. For this, we denotes ∈ CN , E[ssH] = P,

the signal transmitted by the base station, with power constraint tr(P) ≤ P , P > 0; yk ∈ Cnk the signal received

by userk andnk ∼ CN(0, σ2Ink) the noise vector received by userk.7 The fading MIMO channel between the

base station and userk is denotedHk ∈ CN×nk . Moreover we assume thatHk has a separable variance profile,

7up to a scaling of the power constraints of the individual users, setting the same noise varianceσ2 on each receive antenna for every user

does not restrict the generality and simplifies the theoretical expressions.

12

i.e. can be decomposed as

Hk = R12

k XkT12

k (23)

with Rk ∈ Cnk×nk the (Hermitian) correlation matrix at receiverk with respect to the channelHk, Tk ∈ C

N×N

the correlation matrix at the base station for linkHk andXk ∈ Cnk×N a random matrix with Gaussian independent

entries of variance1/nk. In the Kronecker model,Tk andRk satisfy tr(Tk) = N and tr(Rk) = nk.

With the assumptions above, the downlink communication model unfolds

yk = Hks + n (24)

Denoting equivalentlysk the signal transmitted in the dual uplink by userk, such thatE[sksH

k ] = Pk, tr(Pk) ≤Pk, y andn the signal and the noise received by the base station, we havethe converse uplink model

y =

K∑

k=1

HH

k sk + nk (25)

In the following, we will derive the BC rate region by means ofthe MAC-BC duality [20]. We then consider

first the achievable MAC rate region.

2) MAC Rate Region:The (per-receive antenna normalized) rate regionCMAC(P1, . . . , PK ;HH) of the MAC

channelHH under respective transmit power constraintsP1, . . . , PK for users1 to K respectively and compound

channelHH = [HH1 . . .HH

K ], is given in [22], and reads

CMAC(P1, . . . , PK ;HH) =⋃

tr(Pi)≤Pi

Pi≥0i=1,...,K

{{Ri, 1 ≤ i ≤ K} :

i∈S

Ri ≤1

Nlog

∣∣∣∣∣I +1

σ2

i∈S

HH

i PiHi

∣∣∣∣∣ , ∀S ⊂ {1, . . . , K}}

(26)

For any setS ⊂ {1, . . . , K}, thanks to Theorem 2, we have approximately, forN , nk large,

1

Nlog

∣∣∣∣∣IN +1

σ2

i∈S

HH

i PiHi

∣∣∣∣∣ =1

Nlog det

(IN +

1

σ2

k∈S

Tk

∫rk

1 + ckek(−σ2)rkdFRkPk(rk)

)

+∑

k∈S

1

ck

∫log(1 + ckek(−σ2)rk

)dFRkPk(rk)

+ σ2 · m(0)S

(−σ2) − 1 (27)

where

mS(−σ2) =1

Ntr

(∑

k∈S

∫rkdFRkPk(rk)

1 + ckrkek(−σ2)Tk + σ2IN

)−1

(28)

and theei’s satisfy

ei(−σ2) =1

NtrTi

(∑

k∈S

∫rkdFRkPk(rk)

1 + ckrkek(−σ2)Tk + σ2IN

)−1

(29)

From these equations, the complete MAC capacity region can be described. To go further, we first need the

following result,

13

Proposition 1: If any of the correlation matricesRk, k ∈ S is invertible, then the right-hand side of the capacity

approximate (27) is a strictly concave function ofP1, . . . ,P|S|.

Proof: The proof of Proposition 1 is provided in Appendix C.

From Proposition 1, we immediately prove that the|S|-ary set of capacity-achieving matrices(P⋆1, . . . ,P

⋆|S|)

is unique, provided that any of theRk ’s is invertible. In a very similar way as in [27], we will showthat the

matricesP⋆k, k ∈ {1, . . . , |S|}, have the following properties: (i) their eigenspaces are the same as those of their

correspondingRk’s, (ii) their eigenvalues are the solutions of a classical water-filling problem.

Proposition 2: For everyk ∈ S, denoteRk = UkDkUH

k the spectral decomposition ofRk with Uk unitary and

Dk = diag(rk1, . . . , rknk) diagonal. Then the capacity-achieving signal covariance matricesP⋆

1, . . . ,P⋆|S| satisfy

1) P⋆k = UkQ

⋆kU

H

k , with Q⋆k diagonal; i.e. the eigenspace ofP⋆

k is the same as the eigenspace ofRk.

2) denotingδ⋆k = δk(−σ2,P⋆

k) ande⋆k = ek(−σ2,P⋆

k), the ith diagonal entryq⋆ki of Q⋆

k satisfies

q⋆ki =

(µk − 1

cke⋆krki

)+

(30)

where theµk ’s are evaluated such thattr(Qk) = Pk.

Proof: The proof of Proposition 2 recalls the proof from Najim, Prop. 4 in [27]. We essentially need to show

that, at point(δ⋆1 , . . . , δ⋆

|S|, e⋆1, . . . , e

⋆|S|), the derivative of (27) along anyQk is the same whether theδ⋆

k ’s and the

e⋆k ’s are fixed or vary withQk. In other words, using the form (206) for the capacity, let usdefine the functions

V(0)(P1, . . . ,P|S|) =

k∈S

1

Nlog det (Ink

+ ckekRkPk)

+1

Nlog det

(IN +

k∈S

δkTk

)

− σ2K∑

k=1

δk(−σ2)ek(−σ2) (31)

where

ei = ei(P1, . . . ,P|S|) =1

NtrTi

(σ2

[IN +

k∈S

δkTk

])−1

(32)

δi = δi(P1, . . . ,P|S|) =1

nitrRiPi

(σ2 [Ini

+ ciei(z)RiPi])−1

(33)

andV : (P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) 7→ V(0)(P1, . . . ,P|S|). Then we need only prove that, for allk ∈ S,

∂V

∂δk(P1, . . . ,P|S|, δ

⋆1 , . . . , δ⋆

|S|, e⋆1, . . . , e

⋆|S|) = 0 (34)

∂V

∂ek(P1, . . . ,P|S|, δ

⋆1 , . . . , δ⋆

|S|, e⋆1, . . . , e

⋆|S|) = 0 (35)

Remark then that

∂V

∂δk(P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) =

1

Ntr

(

I +∑

i∈S

δiTi

)−1

Tk

− σ2ek (36)

∂V

∂ek(P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) = ck

1

Ntr[(I + ckekRiPi)

−1RkPk

]− σ2δk (37)

14

At initialization, for all k ∈ S, Qk = Qknk

Ink, δk = 1, ek = 1.

while the Pk ’s have not convergeddo

for k ∈ S do

Set (δk, ek) as solution of (32), (33)

for i = 1 . . . , nk do

Set qk,i =“

µk − 1

ckekrki

”+

, with µk such thattrQk = Pk.

end for

end for

end while

TABLE I

ITERATIVE WATER-FILLING ALGORITHM FOR POWER OPTIMIZATION

both being null whenever, for allk, ek = ek(−σ2,P1, . . . ,P|S|) andδk = δk(−σ2,P1, . . . ,P|S|), which is true in

particular for the unique power optimal solutionP⋆1, . . . ,P

⋆|S| wheneverek = e⋆

k andδk = δ⋆k.

When, for all k, ek = e⋆k, δk = δ⋆

k, the maximum ofV over thePk ’s is then obtained by maximizing the

expressionslog det(Ink+ cke⋆

kRkPk) overPk. By Hadamard’s inequality,

det(Ink+ cke⋆

kRkPk) ≤nk∏

i=1

‖(Ink+ cke⋆

kRkPk)i‖2 (38)

where, only here, we denote(X)i theith column of matrixX. The equality is obtained if and only ifInk+cke⋆

kRkPk

is diagonal. The equality case arises forPk andRk = UkDkUH

k co-diagonalizable. In this case, denotingPk =

UkQkUH

k , the entries ofQk, constrained bytr(Qk) = Pk are solutions of the classical optimization problem under

constraint,

supQk

tr(Qk)≤Pk

log det (Ink+ cke⋆

kQkDk) (39)

whose solution is given by the classical water-filling algorithm. Hence (30).

We then propose an iterative water-filling algorithm to obtain the power allocation policy which maximizes the

right-hand side of (27). This is provided in Table I.

In [27], it is proven that the convergence of this algorithm ensures its convergence towards the only capacity-

achieving power allocation policy. However, as recalled in[27], it is difficult to prove the convergence of the

algorithm in Table I in general. Nonetheless, extensive simulations initialized withδk = 1, ek = 1, for all k,

showed the algorithm in Table I always converged.

Remark 3: It is important to stress out the fact that we did not prove that the classical water-filling solution

maximizes the true capacity forN finite. We merely showed that, for largeN , the true capacity expression on the

left-hand side of (27) is closely approximated by an expression (the right-hand side of (27)) whose maximum is

achieved by the iterative water-filling algorithm of Table I. The optimality of the water-filling solution for finiteN

is proven, e.g. in [24], for the special case when allRk are equal.

15

H1

H2

HK

User1

Base station1

Base station2

Base stationK

Fig. 2. Downlink multi-cell scenario

3) BC Rate Region:The capacity region of the broadcast multi-antenna channelhas been recently shown [23]

to be achieved by the dirty paper coding (DPC) algorithm. This regionCBC(P ;H), for a transmit power constraint

P over the compound channelH, is shown by duality arguments to be the set [20]

CBC(P ;H) =⋃

PKk=1 Pk≤P

CMAC(P1, . . . , PK ;HH) (40)

which is easily obtained from Equation (26).

B. Multi-User MIMO

1) Signal Model: In this section we study the per-antenna rate performance ofwireless networks including a

multi-antenna transmitter and a multi-antenna receiver, the latter of which is interfered by several multi-antenna

transmitters. This scheme is well-suited to multi-cell wireless networks with orthogonal intra-cell and interfering

inter-cell transmissions, both in downlink and in uplink. In particular, this encompasses

• multi-cell uplink: consider aK-cell network; the base station of a cell indexed byi ∈ {1, . . . , K} receives

data from a terminal user in this cell8 and is interfered byK − 1 users transmitting on the same physical

resource from remote cells indexed byj ∈ {1, . . . , K}, j 6= i.

• multi-cell downlink: the user being allocated a given time/frequency resource in a cell indexed byi ∈{1, . . . , K} receives data from its dedicated base-station and is interfered byK−1 base stations in neighboring

cells indexed byj ∈ {1, . . . , K}, j 6= i. This situation is depicted in Figure 2.

8this user is allocated a given time/frequency resource, which is orthogonal to time/frequency resources of the other users in the cell; e.g. the

multi-access protocol is OFDMA.

16

In the following, in order not to confuse both scenarios, only the downlink scheme is considered. However, one

must keep in mind that the provided results can easily be adapted to the uplink case.

Consider a wireless mobile network withK ≥ 1 cells indexed from1 to K, controlled bynon-physically

connectedbase stations. We assume that, on a particular time or frequency resource, each base station serves only

one user; therefore the base station and the user of cellj will also be indexed byj. Without loss of generality, we

focus our attention on user1, equipped withN ≫ K antennas and hereafter referred to asthe useror the receiver.

Every base stationj ∈ {1, . . . , K} is equipped withnj ≫ K antennas. Similarly to previous sections, we denote

cj = N/nj.

Denotesj ∈ Cnj , E[sjsH

j ] = Inj, the signal transmitted by base stationj, y ∈ CN and n ∼ CN(0, σ2IN )

the signal and noise vectors received by the user. The fadingMIMO channel between base stationj and the user

is denotedHj ∈ CN×nj . Moreover, we assume thatHj is Gaussian with a separable variance profile, given by

Equation (23).

With the assumptions above, the communication model unfolds

y = H1s1 +

K∑

j=2

Hjsj + n (41)

wheres1 is the useful signal (from base station1) andsj , j ≥ 2, constitute interfering signals.

2) Single User Decoding:

a) Uniform Power Allocation:If the receiving user considers the signals from theK−1 interfering transmitters

as Gaussian noise with a known variance pattern9, then base station1 can transmit with arbitrarily low decoding

error at a per-receive antenna rateCSU(σ2) given by

CSU(σ2) =1

Nlog |IN +

1

σ2

K∑

j=1

HjHH

j | −1

Nlog |IN +

1

σ2

K∑

j=2

HjHH

j | (42)

Assume thatN and theni, i ∈ {1, . . . , K}, are large compared toK and such that the maximum eigenvalues

of Ri or Ti are bounded away fromN . From Corollary 1, we define the functionsmi,(0) as the approximated

Stieltjes transforms of∑K

j=i HjHH

j , i ∈ {1, 2},

mi,(0)(z) =1

Ntr

(K∑

k=1

∫τkdFTk(τk)

1 + ckτkeik(z)

Rk − zIN

)−1

(43)

where, for allj ∈ {1, . . . , K}, eij(z) is solution of the fixed-point equation

eij(z) =

1

NtrRj

(K∑

k=i

∫τkdFTk(τk)

1 + ckτkek(z)Rk − zIN

)−1

(44)

9in practice, the noise variance does not require the knowledge of the other base station to user channels, as it can be inferred from data

sensing in idle mode.

17

From Theorem 2, we then have approximately

CSU(σ2) =1

Nlog det

(IN +

1

σ2

K∑

k=1

Rk

∫τk

1 + cke1k(−σ2)τk

dFTk(τk)

)

− 1

Nlog det

(IN +

1

σ2

K∑

k=2

Rk

∫τk

1 + cke2k(−σ2)τk

dFTk(τk)

)

+K∑

k=1

1

ck

∫log(1 + cke1

k(−σ2)τk

)dFTk(τk)

−K∑

k=2

1

ck

∫log(1 + cke2

k(−σ2)τk

)dFTk(τk)

+ σ2 · [m1,(0)(−σ2) − m2,(0)(−σ2)] (45)

b) Power Optimization for Single-User Decoding:In this section we wish to perform power allocation so to

maximize the single-user decoding capacityCSU(σ2) along P1, the signal variance at the main transmitter. We

therefore replace the matricesTj by T12

j PjT12

j in Equation (45).

Let us rewriteCSU(σ2) like

CSU(σ2) =1

Nlog |IN +

1

σ2A− 1

2 R121 X1T

121 P1T

121 XH

1 R121 A− 1

2 | (46)

with A = IN + 1σ2

∑j>1 HjPjH

H

j .

According to [25], the optimal power allocation strategy insuch a model isP1 =∑n1

i=1 pivH

i vi with V =

[v1, . . . ,vn] the eigenvector matrix in the spectral decomposition ofT1 = VΛVH, Λ diagonal, where thepi’s

satisfy

pi = 0 (1/αi) − 1 ≤ 1

n1

∑n1

l=1(1 − αl)

pi = 1−αi1

n1

Pn1l=1(1−αl)

otherwise(47)

for αi defined as

αi =

(1 +

1

σ2hH

i A− 12

[IN +

1

σ2A− 1

2 H−iP1HH

−iA− 1

2

]−1

A− 12 hi

)−1

(48)

hi is the ith column ofH1, andH−i is H1 with column i removed.

Denotinghi = R121 xi, we havexi centered Gaussian with covarianceT1ii

/n1IN and independent of

R121 A− 1

2

[IN + 1

σ2 A− 1

2 H−iP1HH

−iA− 1

2

]−1

A− 12 R

121 . Therefore, asymptotically onN , from Lemma 7,

αi =

(1 +

T1ii

σ2NtrR1

[A +

1

σ2H−iP1H

H

−i

]−1)−1

(49)

=

(1 +

T1ii

σ2NtrR1

[A +

1

σ2H1P1H

H

1

]−1)−1

(50)

=

1 +

T1ii

σ2NtrR1

IN +

1

σ2

K∑

j=1

HjPjHH

j

−1

−1

(51)

=(1 + T1ii

e11(−σ2)

)−1(52)

18

This leads to the power allocation

pi = 0 T1iie11(−σ2) ≤ 1

n1

∑n1

l=1 1 −(1 + T1ll

e11(−σ2)

)−1

pi =1−(1+T1ii

e11(−σ2))

−1

1n1

Pn1l=1 1−(1+T1ll

e11(−σ2))−1 otherwise

(53)

In particular, if the diagonal entries ofT1 are all equal (as is often the case in practice), then the optimal power

allocation policy is the trivial equal power allocation.

3) MMSE Decoding:Achieving CSU requires non-linear processing at the receiver, such as successive MMSE

interference cancellation techniques. A suboptimal linear technique, the MMSE decoder, is often used instead.

The communication model in this case reads

y =

k∑

j=1

HjHH

j + σ2IN

−1

HH

1

k∑

j=1

Hjsj + n

(54)

where(∑k

j=1 HjHH

j + σ2IN

)−1

HH

1 is the MMSE linear filter at the receiver. Each entry ofy will be processed

individually.

This technique makes it possible to transmit data reliably at any rate inferior to the per-antenna MMSE capacity

CMMSE,

CMMSE(σ2) =1

N

n1∑

i=1

log(1 + γi) (55)

where, denotinghj ∈ Cnj the jth column ofH1 andR121 xj = hj , the SINRγi expresses as

γi =hH

i

(∑Kj=1 HjH

H

j + σ2IN

)−1

hi

1 − hH

i

(∑Kj=1 HjH

H

j + σ2IN

)−1

hi

(56)

= hH

i

K∑

j=1

HjHH

j − hihH

i + σ2IN

−1

hi (57)

= xH

i R12

i

K∑

j=1

HjHH

j − hihH

i + σ2IN

−1

R12

i xi (58)

where Equation (57) comes from a direct application of the matrix inversion lemma. With these notations,xi

has i.i.d. complex Gaussian entries with varianceT1ii/ni and the inner matrix of the right-hand side of (58) is

independent ofxi (since the entries ofH1HH1 − hih

H

i are independent of the entrieshi). Applying Lemma 7, for

N large,

γi =T1ii

NtrR1

K∑

j=1

HjHH

j − hihH

i + σ2IN

−1

(59)

From Lemma 5, the rank 1 perturbation(−hihH

i ) does not affect asymptotically the trace in (59). Therefore,

approximately,

γi =T1ii

NtrR1

K∑

j=1

HjHH

j + σ2IN

−1

(60)

19

Noting thate11(z) in Equation (44) corresponds to the normalized trace in Equation (60), we finally have the

compact expression forCMMSE,

CMMSE(σ2) =1

N

n1∑

i=1

log(1 + T1ii

e11(−σ2)

)(61)

In practice, when no specific power allocation strategy is applied and when no exotic transmit correlation is

present,T1ii= P/nk the average power per transmit symbol, and the capacity becomesCMMSE = 1

c1· log(1 +

P/nk · e1(−σ2)).

V. SIMULATIONS AND RESULTS

In the following, we apply the results obtained in Sections IV-A and IV-B for the broadcast channel rate region

and the multi-user MIMO cases respectively.

A. BC Rate Region

First, we provide simulation results in the context of a two-user broadcast channel, withN = 8 transmit antennas

and n1 = n2 = 4 receive antennas, all placed in linear arrays. The distancedT between subsequent transmit

antennas is such thatdT/λ = 10, λ the signal median wavelength; between adjacent receive antennas, the distance

dR is the same for users1 and 2 and such thatdR/λ = 1/4. The variance profile is generated thanks to Jakes’

model, with privileged directions of signal departure and arrival. For instance, the entry(a, b) of matrix T1 is

T1ab=

∫ θ(T1)max,

θ(T1)

min

exp

(2πi|a − b|d

T

λcos(θ)

)dθ (62)

where, in our case, with obvious notations,θ(R1)min = 0, θ

(R1)max = π/2, θ

(R2)min = π/3, θ

(R2)max = 5π/6, θ

(T1)min = 2π/3,

θ(T1)max, = −5π/6, θ

(T2)min = π, θ

(T2)max = −π/2.

In Figure 3, we take a20 dB value for the SNR and compare the rate region obtained from the convex optimization

scheme of Section IV to the equal power allocation strategy.Notice that, in spite of the strong correlation exhibiting

very low eigenvalues inR1 and R2, very little is gained by our power allocation scheme compared to uniform

power allocation. Quite to the contrary, in Figure 4, we consider a−5 dB SNR, and observe a substantial gain in

capacity from the optimal iterative water-filling algorithm of Table I compared to equal power allocation on the

transmit antenna array.

In Figure 5, a comparison is made between the theoretical andsimulated rate regions for a20 dB SNR. The

latter is obtained from1, 000 averaged Monte Carlo simulations for every transmit power pair (P1, P2). We observe

an almost perfect fit, even for these lowN = 8, n1 = n2 = 4 numbers of transmit and receive antennas.

B. Multi-User MIMO

We now apply Equations (45) and (61) to the downlink of a two-cell network. The capacity analyzed here is

the per-antenna achievable rate on the link between base station 1 and the user, the latter of which is interfered by

base station2. The relative power of the interfering signal from base station 2 is on averageΓ times that of user

20

0 0.5 1 1.5 2 2.5 30

1

2

3

Rate of User1 [bits/s/Hz]

Rat

eof

Use

r2[b

its/s

/Hz]

Fig. 3. (Per-antenna) rate regionCBC for K = 2 users,N = 8, n1 = n2 = 4, SNR = 20 dB, random transmit-receive solid angle of

apertureπ/2, dT/λ = 10, dR/λ = 1/4. In thick line, capacity limit whenE[ssH] = IN .

1. Both base stations1 and2 are equipped with linear arrays ofn antennas and the user with a linear array ofN

antennas. The correlation matricesTi at the transmission andRi at the reception,i ∈ {1, 2}, are also modeled

thanks to the generalized Jake’s model.

In Figure 6, we tookN = 16, Γ = 0.25 and we consider single-user decoding at the receiver. For every realization

of Ti, Ri, 1000 channel realizations are processed to produce the simulated ergodic capacity and compared to the

theoretical capacity (61). Those capacities are then averaged over100 realizations ofTi, Ri, varying in the random

choice ofθ(Ri)min , θ

(Ti)min , θ

(Ri)max and θ

(Ti)max with constraintθ(Ri)

max − θ(Ri)min = θ

(Ti)max − θ

(Ti)min = π/2, while the distance

between antennas indexed bya andb aredTi

ab = 10λ|a− b| at the transmitters anddRab = 2λ|a− b| at the receiver.

The SNR ranges from−5 dB to 30 dB, andn ∈ {8, 16}. We observe here that Monte-Carlo simulations perfectly

match the capacity obtained from Equation (45).

In Figure 7, with the same assumptions as previously, we apply MMSE decoding at the base station. Here, a

slight difference is observed in the high SNR regime betweentheory and practice. This was somehow expected,

since the largeN approximations in Lemmas 7 and 5 especially are very loose for σ2 close toR−. To cope with

this gap, many more antennas must be used. We also observe a significant difference in performance between the

optimal single-user and the suboptimal linear MMSE decoders, especially in the high SNR region. Therefore, in

wireless networks, when interfering cells are treated as Gaussian correlated noise at the cell-edge, i.e. where the

21

0 0.1 0.2 0.3 0.4 0.5 0.60

0.2

0.4

0.6

Rate of User1 [bits/s/Hz]

Rat

eof

Use

r2[b

its/s

/Hz]

Fig. 4. (Per-antenna) rate regionCBC for K = 2 users,N = 8, n1 = n2 = 4, SNR = −5 dB, random transmit-receive solid angle of

apertureπ/2, dT/λ = 10, dR/λ = 1/4. In thick line, capacity limit whenE[ssH] = IN .

interference is maximum, the MMSE decoder provides tremendous performance loss.

Finally, in Figure 8, we study the effect of the angle spread of energy transmission and reception. As previously

recalled, in most contributions treating antenna correlation, only the distances between the antennas is considered as

relevant to the signal correlation. We provide hereafter a classical situation for which angles of departure and arrival

are of critical importance to the system rate performance. The situation is similar to that of Figure 6 withn = 8

(N = 16), SNR = 20 dB; the median directional of arrival and departure (for all devices) are taken fromπ/32

(grazing angle) toπ/2 (signal transmitted/received in front), while the angle spread, on thex-axis, varies fromπ/32

to π/2. First, note importantly that the directions of departure or arrival are significant to the system performance:

for instance, with a quite realistic angle spread ofπ/4, the single-user decoding capacity shows a three-fold increase

from grazing to orthogonal angles.10 We observe that, in all cases, the per-antenna capacity grows with the angle

spread, which therefore offers some sort of diversity gain.Also, grazing angles provide significantly less diversity

gain than square angles of departure or arrival.

10remember that we assume isotropic energy emission/reception in the top-bottom direction and we consider here a restrictive angle of captured

energy in the horizontal plane so that, in reality, one mightexpect even more staggering results.

22

0 0.5 1 1.5 2 2.5 30

1

2

3

Rate of User1 [bits/s/Hz]

Rat

eof

Use

r2[b

its/s

/Hz]

Theory

Simulation

Fig. 5. (Per-antenna) rate regionCBC for K = 2 users, theory against simulation,N = 8, n1 = n2 = 4, SNR = 20 dB, random

transmit-receive solid angle of apertureπ/2, dT/λ = 10, dR/λ = 1/4.

VI. CONCLUSION

In this contribution, we proposed to analyze the per-antenna rate performance of a wide family of multi-antenna

communication schemes including multiple cells and/or multiple users, and taking into account the correlation effects

due to antenna closeness and directional energy transmission/reception. As an introductory example, we studied

the rate region of MAC and BC channels, as well as the uplink and downlink capacity of multi-cell networks with

interference. Our main results stem from a novel mathematical deterministic equivalent of the Stieltjes transform

and the Shannon transform of a certain type of large random matrices. Based on these new tools, an accurate

analysis of the effects of correlation can be directly translated into the antenna efficiency of multi-user multi-cell

systems. We also provided an iterative water-filling algorithm to achieve the capacity boundary of the MAC and

BC rate regions, which proved in simulation to give tremendous capacity gains in the low SNR regime.

23

−5 0 5 10 15 20 25 300

1

2

3

4

SNR [dB]

Per

-rec

eive

ante

nna

capa

cityCSU

[bits

/s/H

z] Simulation,n = 8

Theory,n = 8

Simulation,n = 16

Theory,n = 16

Fig. 6. Capacity of point-to-point MIMO in two-cell uplink,single-user decoding,N = 16, n ∈ {8, 16}, Γ = 25%.

APPENDIX A

PROOF OFTHEOREM 1

Proof: For ease of read, the proof will be divided into several sections.

We first consider the caseK = 1, whose generalization toK ≥ 1 is given in Appendix A-E. Therefore, in the

coming sections, we drop the useless indexes.

A. Truncation and centralization

We begin with the truncation and centralization steps whichwill replaceX, R andT by matrices with bounded

entries, more suitable for analysis; the difference of the Stieltjes transforms of the original and newBN converging to

zero. Since vague convergence of distribution functions isequivalent to the convergence of their Stieltjes transforms,

it is sufficient to show the original and new empirical distribution functions of the eigenvalues approach each other

almost surely in the space of subprobability measures onR with respect to the topology which yields vague

convergence.

Let Xij = Xij1{|Xij |<√

N}−E(Xij1{|Xij |<√

N}) andX =(

1√nXij

). Then, from c), Lemma 1 and a), Lemma

3, it follows exactly as in the initial truncation and centralization steps in [4] and [19] (which provide more details

in their appendices), that

|FBN − FS+R12 eXTeXHR

12 | a.s.−→ 0 (63)

24

−5 0 5 10 15 20 25 300

1

2

3

4

SNR [dB]

Per

-rec

eive

ante

nna

capa

cityCM

MSE

[bits

/s/H

z]Simulation,n = 8

Theory,n = 8

Simulation,n = 16

Theory,n = 16

Fig. 7. Capacity of point-to-point MIMO in two-cell uplink,with MMSE decoder,N = 16, n ∈ {8, 16}, Γ = 25%.

asN → ∞.

Let now X ij = Xij · 1{|Xij |<ln N} −E(Xij1I{|Xij |<ln N}) andX =(

1√nX ij

). This is the final truncation and

centralization step, which will be practically handled thesame way as in [4], which some minor modifications,

given presently.

For any Hermitian non-negative definiter × r matrix A, let λAi denote itsi-th smallest eigenvalue ofA. With

A = Udiag(λA1 , . . . , λA

r )UH its spectral decomposition, let for anyα > 0

Aα = Udiag(λA1 1{λA

r ≤α}, . . . , λAr 1{λ1≤α})U

H (64)

Then for anyN × N matrix Q, we get from 1) and 2), Lemma 3,

‖FS+R12 QTQHR

12 − FS+R

12

αQTαQHR12

α

‖ ≤ 2

Nrank(R

12 − R

12 α) +

1

Nrank(T − Tα) (65)

=2

N

N∑

i=1

1{λR

i>α} +

1

N

n∑

i=1

1{λT

i>α)} (66)

= 2FR((α,∞)) +1

cNFT((α,∞)) (67)

Therefore, from the assumptions 4) and 6) in Theorem 1, we have for any sequence{αN} with αN → ∞

‖FS+R12 QTQHR

12 − FS+R

12

αN QTαN QHR12

αN ‖ → 0 (68)

25

π32

π4

π2

3π4

π0

1

2

3

Departure/arrival angle spread [rad]

Per

-rec

eive

ante

nna

capa

cityCSU

[bits

/s/H

z]

Median angleπ/2

Median angleπ/4

Median angleπ/8

Median angleπ/16

Median angleπ/32

Fig. 8. Capacity of point-to-point MIMO in two-cell uplink,single-user decoding,N = 16, n = 8, Γ = 25%, SNR = 20 dB, for different

median angles of departure/arrival.

asN → ∞.

A metricD on probability measures defined onR, which induces the topology of vague convergence, is introduced

in [4] to handle the last truncation step. The matrices studied in [4] are essentiallyBN with R = IN . Following

the steps beginning at (3.4) in [4], we see in our case that when αN is chosen so that asN → ∞, αN ↑ ∞,

α8N (E|X2

111{X11|≥lnN} + N−1) → 0 (69)

and∞∑

N=1

α16N

N2

(E|X11|41{|X11|<

√N)} + 1

)< ∞ (70)

We will get

D(FS+R12

αN eXTαN eXHR12

αN

, FS+R12

αN XTαN XHR

12

αN

)a.s.−→ 0 (71)

asN → ∞.

Since E|X11|2 → 1 as N → ∞ we can rescale and replaceX with X/√

E|X11|2, whose components are

bounded byk lnN for somek > 2. Let log N denote logarithm ofN with basee1/k (so thatk lnN = log N ).

Therefore, from (68) and (71) we can assume that for eachN the Xij are i.i.d.,EX11 = 0, E|X11|2 = 1, and

|Xij | ≤ log N .

26

Later on the proof will require a restricted growth rate on both ‖R‖ and ‖T‖. We see from (68) that we can

also assume

max(‖R‖, ‖T‖) ≤ log N (72)

B. Deterministic approximation ofmN (z)

Write X = [x1, . . . ,xn], xi ∈ CN and letyj = (1/√

n)R12 xj . Then we can write

BN = S +

n∑

j=1

τjyjyH

j (73)

We assumez ∈ C+ and letv = ℑ[z]. Define

eN = eN (z) = (1/N) trR(BN − zIN)−1 (74)

and

pN = − 1

nz

n∑

j=1

τj

1 + cNτjeN=

∫ −τ

z(1 + cNτeN )dFT(τ) (75)

Write BN = OΛOH, Λ = diag(λ1, . . . , λN ), its spectral decomposition. LetR = {Rij} = OHRO. Then

eN = (1/N) trR(Λ − zIN )−1 = (1/N)

N∑

i=1

Rii

λi − z(76)

We therefore see thateN is the Stieltjes transform of a measure on the nonnegative reals with total mass(1/N) trR.

It follows that botheN(z) and zeN(z) mapC+ into C+. This implies thatpN (z) and zpN(z) mapC+ into C+

and, asz → ∞, zpN(z) → −(1/n) trT. Therefore, from Lemma 6, we also havepN the Stieltjes transform of a

measure on the nonnegative reals with total mass(1/n) trT. From (72), it follows that

|eN | ≤ v−1 log N (77)

and ∣∣∣∣∫

τ

(1 + cNτeN )dFT(τ)

∣∣∣∣ = |zpN(z)| ≤ |z|v−1 log N (78)

More generally, from Lemma 6, any function of the form

−τ

z(1 + m(z))(79)

whereτ ≥ 0 andm(z) is the Stieltjes transform of a finite measure onR+, is the Stieltjes transform of a measure

on the nonnegative reals with total massτ . It follows that∣∣∣∣

τ

1 + m(z)

∣∣∣∣ ≤ τ |z|v−1 (80)

Fix now z ∈ C+. Let B(j) = BN − τjyjyH

j . DefineD = −zIN + S− zpN(z)R. We write

BN − zIN − D =n∑

j=1

τjyjyH

j + zpNR (81)

27

Taking inverses and using Lemma 4 we have

(BN − zIN)−1 − D−1 =

n∑

j=1

τjD−1yjy

H

j (BN − zIN )−1 + zpND−1R(BN − zIN )−1 (82)

=

n∑

j=1

τj

D−1yjyH

j (B(j) − zIN)−1

1 + τjyH

j (B(j) − zIN)−1yj+ zpND−1R(BN − zIN)−1 (83)

Taking traces and dividing byN , we have

1

NtrD−1 − mN (z) =

1

n

n∑

j=1

τjdj ≡ wmN (84)

where

dj =(1/N)xH

j R12 (B(j) − zIN)−1D−1R

12 xj

1 + τjyH

j (B(j) − zIN )−1yj− (1/N) trR(BN − zIN )−1D−1

1 + cNτjeN(85)

Multiplying both sides of the above matrix identity byR, and then taking traces and dividing byN , we find

1

NtrD−1R − eN (z) =

1

n

n∑

j=1

τjdej ≡ we

N (86)

where

dej =

(1/N)xH

j R12 (B(j) − zIN)−1RD−1R

12 xj

1 + τjyH

j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN)−1RD−1

1 + cNτjeN(87)

We then show that, for anyk > 0, almost surely

limN→∞

(logk N)wmN = 0 (88)

and

limn→∞

(logk N)weN = 0 (89)

Notice that for eachj, yH

j (B(j) − zIN )−1yj can be viewed as the Stieltjes transform of a measure onR+.

Therefore from (80) we have ∣∣∣∣∣1

1 + τjyH

j (B(j) − zIN)−1yj

∣∣∣∣∣ ≤|z|v

. (90)

For eachj, let e(j) = e(j)(z) = (1/N) trR(B(j) − zIN)−1, and

p(j) = p(j)(z) =

∫ −τ

z(1 + cNτe(j))dFT(τ) (91)

both being Stieltjes transforms of measures onR+, along with the integrand for eachτ .

Using Lemma 4, Equations (72) and (80), we have

|zpN − zp(j)| = |eN − e(j)|cN

∣∣∣∣∫

τ2

(1 + cNτeN )(1 + cNτe(j))dFT(τ)

∣∣∣∣ ≤cN |z|2 log3 N

Nv3. (92)

Let D(j) = −zIN + S − zp(j)(z)R. Notice that(BN − zIN )−1 and (B(j) − zIN)−1 are bounded in spectral

norm byv−1 and, from Lemma 8, the same holds true forD−1 andD−1(j) .

In order to handle bothwmN , dj andwe

N , dej at the same time, we shall denote byE eitherT or IN , andwN ,

dj for now will denote either the originalwmN , dj or we

N , dej .

28

Write dj = d1j + d2

j + d3j + d4

j , where

d1j =

(1/N)xH

j R12 (B(j) − zIN)−1ED−1R

12 xj

1 + τjyH

j (B(j) − zIN)−1yj−

(1/N)xH

j R12 (B(j) − zIN)−1ED−1

(j)R12 xj

1 + τjyH

j (B(j) − zIN)−1yj(93)

d2j =

(1/N)xH

j R12 (B(j) − zIN)−1ED−1

(j)R12 xj

1 + τjyH

j (B(j) − zIN)−1yj−

(1/N) trR(B(j) − zIN)−1ED−1(j)

1 + τjyH

j (B(j) − zIN)−1yj(94)

d3j =

(1/N) trR(B(j) − zIN)−1ED−1(j)

1 + τjyH

j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN )−1ED−1

1 + τjyH

j (B(j) − zIN)−1yj(95)

d4j =

(1/N) trR(BN − zIN)−1ED−1

1 + τjyH

j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN)−1ED−1

1 + cNτjeN(96)

From Lemma 4, Equations (72), (90) and (92), we have

τj |d1j | ≤

1

N‖xj‖2 cN log7 N |z|3

Nv7(97)

τj |d2j | ≤ |z|v−1 log N

N

∣∣∣xH

j R12 (B(j) − zIN )−1ED−1

(j)R12 xj − trR(B(j) − zIN)−1ED−1

(j)

∣∣∣ (98)

τj |d3j | ≤

|z| log3 N

vN

(1

v2+

cN |z|2 log3 N

v6

)→ 0, asn → ∞ (99)

τj |d4j | ≤

|z|cN log4 N

Nv3

(|xH

j R12 (B(j) − zIN )−1R

12 xj − trR

12 (B(j) − zIN)−1R

12 | + log N

v

)(100)

From Lemma 7, there existsK > 0 such that,

E| 1

N‖xj‖2 − 1|6 ≤ KN−3 log12 N (101)

E1

N6|xH

j R12 (B(j) − zIN)−1ED−1

(j)R12 xj − trR(B(j) − zIN)−1ED−1

(j) |6 ≤ KN−3v−12 log24 N (102)

E1

N6|xH

j R12 (B(j) − zIN)−1R

12 xj − trR

12 (B(j) − zIN)−1R

12 |6 ≤ KN−3v−6 log18 N (103)

All three moments when multiplied byn times any power oflog N , are summable. Applying standard argu-

ments using the Borel-Cantelli lemma and Boole’s inequality (on 4n events), we conclude that, for anyk > 0

logk N maxj≤n τjdja.s.−→ 0 asN → ∞. Hence Equations (88) and (89).

C. Existence and uniqueness ofm(0)N (z)

We show now that for anyN , n, S, R, N × N nonnegative definite andT = diag(τ1, . . . , τN ), τk ≥ 0 for all

1 ≤ k ≤ N , there exists a uniquee with positive imaginary part for which

e =1

Ntr

(S +

[∫τ

1 + cNτedFT(τ)

]R − zIN

)−1

R (104)

For existence we consider the subsequences{Nj}, {nj} with Nj = jN , nj = jn, so thatcNjremainscN , form

the block diagonal matrices

RNj= diag(R,R, . . . ,R), SNj

= diag(S,S, . . . ,S) (105)

both jN × jN and

TNj= diag(T,T, . . . ,T) (106)

29

of size jn × jn.

We see thatFTNj = FT and the right side of (104) remains unchanged for allNj . Consider a realization where

weNj

→ 0 as j → ∞. We have|eNj(z)| = |(jN)−1 trR(BjN − zIN )−1| ≤ v−1 log N , remaining bounded as

j → ∞. Consider then a subsequence for whicheNjconverges to, say,e. From (80), we see that

∣∣∣∣τ

1 + cNτeNj

∣∣∣∣ ≤ τ |z|v−1 (107)

so that from the dominated convergence theorem we have∫

τ

1 + cNτeNj(z)

dFT(τ) →∫

τ

1 + cNτedFT(τ) (108)

along this subsequence. Thereforee solves (104).

We now show uniqueness. Lete be a solution to (104) and lete2 = ℑ[e]. Recalling the definition ofD we write

e =1

Ntr

(D−1RD−H

(S +

[∫τ

1 + cNτe∗dFT(τ)

]R − z∗I

))(109)

We see that since bothR and S are Hermitian nonnegative definite,tr(D−1RD−HS

)is real and nonnegative.

Therefore we can write

e2 =1

Ntr

(D−1R(DH)−1

([∫cNτ2e2

|1 + cNτe|2 dFT(τ)

]R + vIN

))= e2α + vβ (110)

where we denoted

α =1

Ntr

(D−1R(DH)−1

[∫cNτ2

|1 + cNτe|2 dFT(τ)

]R

)(111)

β =1

Ntr(D−1R(DH)−1

)(112)

Let e be another solution to (104), withe2 = ℑ[e], and analogously we can writee2 = e2α + vβ. Let D denote

D with e replaced bye. Then we havee − e = γ(e − e) where

γ =

∫cNτ2

(1 + cNτe)(1 + cNτe)dFT(τ)

tr D−1RD−1R

N(113)

If R is the zero matrix, thenγ = 0, ande = e would follow. ForR 6= 0 we use Cauchy-Schwarz to find

|γ| ≤(∫

cNτ2

|1 + cNτe|2 dFT(τ)tr D−1R(DH)−1R

N

) 12

(∫cNτ2

|1 + cNτe|2 dFT(τ)tr D−1R(DH)−1R

N

) 12

(114)

= α12 α

12 (115)

=

(e2α

e2α + vβ

) 12(

e2α

e2α + vβ

) 12

(116)

Necessarilyβ andβ are positive sinceR 6= 0. Therefore|γ| < 1 so we must havee = e.

D. Termination of the proof

Let e0N denote the solution to (104). We show now for anyℓ > 0, almost surely

limN→∞

logℓ N(eN − e0N) = 0 (117)

30

Let e02 = ℑ[e0

N ], andα0 = α0N , β0 = β0

N be the values as above for whiche02 = e0

2α0 + vβ0. We have, using (72)

and (80),

e02α

0N/β0

N ≤ e02cN log N

∫τ2

|1 + cNτe0N |2 dFT(τ) (118)

= − log Nℑ[∫

τ

1 + cNτe0N

dFT(τ)

](119)

≤ log2 N |z|v−1 (120)

Therefore

α0 =

(e02α

0

e02α

0 + vβ0

)(121)

=

(e02α

0/β0

v + e02α

0/β0

)(122)

≤(

log2 N |z|v2 + log2 N |z|

)(123)

Let D0, D denoteD as above withe replaced by, respectivelye0N andeN . We have

eN =1

NtrD−1R − we

N (124)

With e2 = ℑ[eN ] we write as above

e2 =1

Ntr

(D−1RD−H

([∫cNτ2e2

|1 + cNτeN |2 dFT(τ)

]R + vIN

))−ℑ[we

N ] (125)

= e2α + vβ −ℑweN (126)

We have as aboveeN − e0N = γ(eN − e0

N ) + weN where now

|γ| ≤ α012 α

12 (127)

Fix an ℓ > 0 and consider a realization for whichlogℓ′ N weN → 0, whereℓ′ = max(ℓ + 1, 4) andn large enough

so that

|weN | ≤ v3

4cN |z|2 log3 N(128)

Supposeβ ≤ v2

4cN |z|2 log3 N. Then by Equations (72) and (80) we get

α ≤ cNv−2|z|2 log3 Nβ ≤ 1/4 (129)

which implies|γ| ≤ 1/2. Otherwise we get from (121) and (128)

|γ| ≤ α012

(e2α

e2α + vβ −ℑ[weN ]

) 12

(130)

≤(

log N |z|v2 + log N |z|

) 12

(131)

31

Therefore for allN large

logℓ N |eN − e0N | ≤ (logℓ N)we

N

1 −(

log2 N |z|v2+log2 N |z|

) 12

(132)

≤ 2v−2(v2 + log2 N |z|)(logℓ N)weN (133)

→ 0 (134)

asn → ∞. Therefore (117) follows.

Let m0N = N−1 trD0. We finally show

mN − m0N

a.s.−→ 0 (135)

asn → ∞. SincemN = N−1 trD−1 − wmN , we have

mN − m0N = γ(eN − e0

N ) − wmN (136)

where now

γ =

∫cNτ2

(1 + cNτeN )(1 + cNτe0N )

dFT(τ)tr D−1RD0−1

N(137)

From (72) and (80) we get|γ| ≤ cN |z|2v−4 log3 N . Therefore, from (88) and (117), we get (135).

Returning to the original assumptions onX11, T, andR, for each of a countably infinite collection ofz with

positive imaginary part, possessing a cluster point with positive imaginary part, we have (135). Therefore, by Vitali’s

convergence theorem, page 168 of [16], for anyε > 0 we have with probability onemN (z)−m0N(z) → 0 uniformly

in any region ofC bounded by a contour interior to

C \ ({z : |z| ≤ ǫ} ∪ {z = x + iv : x > 0, |v| ≤ ε}) (138)

If S = f(R), meaning the eigenvalues ofR are changed viaf in the spectral decomposition ofR, then we have

m0N (z) =

∫1

f(r) + r∫

τ1+cN τe0

N(z)

dFT(τ) − zdFR(r) (139)

e0N (z) =

∫r

f(r) + r∫

τ1+cN τe0

N(z)

dFT(τ) − zdFR(r) (140)

E. Extension toK ≥ 1

Suppose now

BN = S +

K∑

k=1

R12

k XkTkXH

kR12

k (141)

where K remains fixed,Xk is N × nk satisfying 1, theXk ’s are independent,Rk satisfies 2) and 4),Tk is

nk × nk satisfying 3) and 4),ck = N/nk satisfies 6), andS satisfies 5). After truncation and centralization we

may assume the same condition on the entries of theXk ’s, and the spectral norms of theRk ’s and theTk ’s. Write

yk,j = (1/√

nk)R12

k xk,j , with xk,j denoting thej-th column ofXk, and letτk,j denote thej-th diagonal element

of Tk. Then we can write

BN = S +

K∑

k=1

nk∑

j=1

τk,jyk,jyH

k,j (142)

32

Define

eN,k = eN,k(z) = (1/N) trRk(BN − zIN )−1 (143)

and

pk = − 1

nkz

nk∑

j=1

τk,j

1 + ckτk,jeN,k(144)

=

∫ −τk

1 + ckτkeN,kdFTk(τk) (145)

We seeeN,k and pk have the same properties aseN and pN . Let Bk,(j) = BN − τk,jyk,jyH

k,j . Define D =

−zIN + S −∑K

k=1 zpk(z)Rk. We write

BN − zIN − D =

K∑

k=1

nk∑

j=1

τk,jyk,jyH

k,j + zpk(z)Rk

(146)

Taking inverses and using Lemma 4, we have

D−1 − (BN − zIN )−1 =K∑

k=1

nk∑

j=1

τk,jD−1yk,jy

H

k,j(BN − zIN )−1 + zpkD−1Rk(BN − zIN)−1

(147)

=

K∑

k=1

nk∑

j=1

τk,j

D−1yk,jyH

k,j(Bk,(j) − zIN)−1

1 + τk,jyH

k,j(Bk,(j) − zIN )−1yk,j+ zpkD

−1Rk(BN − zIN )−1

(148)

Taking traces and dividing byN , we have

(1/N) trD−1 − mN(z) =

K∑

k=1

1

nk

nk∑

j=1

τk,jdk,j ≡ wmN (149)

where

dk,j =(1/N)xH

k,jR12

k (Bk,(j) − zIN )−1D−1R12

k xk,j

1 + τk,jyH

k,j(Bk,(j) − zIN)−1yk,j− (1/N) trRk(BN − zIN )−1D−1

1 + ckτk,jeN,k(150)

For a fixedk ∈ {1, . . . , K}, we multiply the above matrix identity byRk, take traces and divide byN . Thus we

get

(1/N) trD−1Rk − ek(z) =

K∑

k=1

1

nk

nk∑

j=1

τk,jdekkj ≡ we

k (151)

where

dekkj =

(1/N)xH

k,jR12

k (Bk,(j) − zIN)−1RkD−1R

12

k xk,j

1 + τk,jyH

k,j(Bk,(j) − zIN)−1yk,j− (1/N) trRk(BN − zIN)−1RkD

−1

1 + ckτk,jeN,k(152)

In exactly the same way as in the case withK = 1 we find that for any nonnegativeℓ, logℓ NwmN and the

logℓ wei ’s converge almost surely to zero. By considering block diagonal matrices as before withN , ni’s, S, Ri’s

andTi’s all fixed we find that there existe01, . . . , e

0K with positive imaginary parts for which for eachi

e0i =

1

NtrRi

(S +

K∑

k=1

[∫τ

1 + ckτe0k

dFTk(τ)

]Rk − zIN

)−1

(153)

33

Let us verify uniqueness. Lete0 = (e01, . . . , e

0K)T, and letD0 denote the matrix in (153) whose inverse is taken

(essentiallyD after theeN,i’s are replaced by thee0i ’s). Let for eachj, e0

j,2 = ℑe0j , ande0

2 = (e01,2, . . . , e

0K,2)

T. Then,

noticing that for eachi, trSD0−HRiD

0−1is real and nonnegative (positive wheneverS 6= 0) andtrD0−H

RiD0−1

and trRjD0−H

RiD0−1

are real and positive for alli, j, we have

e0i,2 = ℑ

1

Ntr

S +

K∑

j=1

[∫τ

1 + cjτe0j

dFTj (τ)

]Rj − z∗I

D0−H

RiD0−1

(154)

=

K∑

j=1

e0j,2

1

NtrRjD

0−H

RiD0−1

cj

∫τ2

|1 + cjτe0j |2

dFTj (τ) +v

NtrD0−H

RiD0−1

(155)

Let C0 = (c0ij), b0 = (b0

1, . . . , b0N )T, where

c0ij =

1

NtrRjD

0−H

RiD0−1

cj

∫τ2

|1 + cjτe0j |2

dFTj (τ) (156)

and

b0i =

1

NtrD0−H

RiD0−1

(157)

Therefore we havee02 satisfies

e02 = C0e0

2 + vb0 (158)

We see that eache0j,2, c0

ij , andb0j are positive. Therefore, from Lemma 9 we haveρ(C0) < 1.

Let e0 = (e01, . . . , e

0K)T be another solution to (153), withe0

2, D0, C0 = (c0ij), b0 defined analogously, so that

(158) holds andρ(C0) < 1. We have for eachi,

e0i − e0

i =1

NtrRiD

0−1K∑

j=1

(e0j − e0

j)cj

∫τ2

(1 + cjτe0j )(1 + cjτe0

j)dFTj (τ)RjD

0−1(159)

Thus withA = (aij) where

aij =1

NtrRiD

0−1RjD

0−1cj

∫τ2

(1 + cjτe0j )(1 + cjτe0

j)dFTj (τ) (160)

we have

e0 − e0 = A(e0 − e0) (161)

which means, ife0 6= e0, thenA has an eigenvalue equal to 1.

Applying Cauchy-Schwarz we have

|aij | ≤(

1

NRiD

0−1RjD

0−H

∫τ2

|1 + cjτe0j |2

dFTj (τ)

) 12(

1

NRiD

0−1RjD

0−H

∫τ2

|1 + cjτe0j |2

dFTj (τ)

) 12

(162)

= c0ij

1/2c0ij

1/2(163)

Therefore from Lemmas 10 and 11 we get

ρ(A) ≤ ρ(c0ij

12 c0

ij

12 ) ≤ ρ(C0)

12 ρ(C0)

12 < 1 (164)

34

a contradiction to the statementA has an eigenvalue equal to 1. Consequently we havee = e.

Let eN = (eN,1, . . . , eN,K)T ande0N = (e0

N,1, . . . , e0N,K) denote the vector solution to (153) for eachN . We

will show for anyℓ > 0, almost surely

limN→∞

logℓ N(eN − e0N ) → 0 (165)

We have

e0N = (

1

NtrR1D

0−1, . . . ,

1

NtrRKD0−1

)T (166)

Let we = weN = −(we

1, . . . , weK)T. Then we can write

eN = (1

NtrR1D

−1, . . . ,1

NtrRKD−1)T + we (167)

Therefore

eN − e0N = A(N)(eN − e0

N ) + we (168)

whereA(N) = (aij(N)) with

aij(N) =1

NtrRiD

−1RjD0−1

cj

∫τ2

(1 + cjτeN,j)(1 + cjτe0N,j)

dFTj (τ) (169)

We let e0N,2, b0

ij(N), C0(N), b0N,i, andb0

N , denote the quantities from above, reflecting now their dependence on

N . Let C(N) = (cij(N)) be K × K with

cij(N) =1

NtrRjD

−HRiD−1cj

∫τ2

|1 + cjτeN,j |2dFTj (τ) (170)

Let eN,2 = ℑ[eN ] andwe2 = ℑ[we]. DefinebN = (bN,1, . . . , bN,K)T with

bN,i =1

NtrD−HRiD

−1 (171)

Then, as above we find that

eN,2 = C(N)eN,2 + vbN + we2 (172)

Using (72) and (80) we see there exists a constantK1 > 0 for which

c0ij(N) ≤ K1 log3 Nb0

N,i (173)

and

cij(N) ≤ K1 log3 NbN,i (174)

cij(N) ≤ K1 log4 N (175)

for eachi, j. Therefore, from (158) we see there existsK > 0 for which

e0N,i ≤ K log4 Nvb0

N,i (176)

Let x be such thatxT is a left eigenvector ofC0(N) corresponding to eigenvalueρ(C0(N)), guaranteed by Lemma

12. Then from (172) we have

xTe0N,2 = ρ(C0(N))xTe0

N,2 + vxTb0N (177)

35

Using (177) we have

1 − ρ(C0(N)) =vxTb0

N

xTe0N,2

≥ (K log4 N)−1 (178)

Fix an ℓ > 0 and consider a realization for whichlogℓ+3+p NweN → 0, asN → ∞, wherep ≥ 12K − 7. We

will show for all N large

ρ(C(N)) ≤ 1 + (K log4 N)−1 (179)

For eachN we rearrange the entries ofeN,2, vbm + we2, and C(n) depending on whether theith entry of

vbm + we2 is greater than, or less than or equal to zero. We can therefore assume

C =

C11(N) C12(N)

C21(N) C22(N)

(180)

whereC11(N) is k1×k1, C22(N) is k2×k2, C12(N) is k1×k2, andC21(N) is k2×k1. From Lemma 9 we have

ρ(C11(N)) < 1. If vbN,i + we2,i ≤ 0, then necessarilyvbN,i ≤ |we

N | ≤ K1(log n)−(3+p), and so from (176) we

have the entries ofC21(N) andC22(N) bounded byK1(log N)−p. We may assume for allN large0 < k1 < K,

since otherwise we would haveρ(C(N)) < 1.

We seek an expression fordet(C(N)−λIN ) in which Lemma 14 can be used. We considerN large enough so

that, for |λ| ≥ 1/2, we have(C22(N) − λIN )−1 existing with entries uniformly bounded. We have

det(C(N) − λI) = det

I −C12(N)(C22(N) − λI)−1

0 I

C11(N) − λI C12(N)

C21(N) C22(N) − λI

(181)

= det

C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N) 0

C21(N) C22(N) − λI

(182)

= det(C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N)) det(C22(N) − λI) (183)

We see then that forλ = ρ(C(N)) real and greater than1,

det(C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N)) (184)

must be zero.

Notice that from (176), the entries ofC12(N)(C22(N)− λI)−1C21(N) can be made smaller than any negative

power oflog N for p sufficiently large. Notice also that the diagonal elements of C11(N) are all less than1. From

this, Lemma 13 and (176), we see thatρ(C(N)) ≤ K1 log4 N . The determinant in (184) can be written as

det(C11(N) − λI) + g(λ) (185)

Whereg(λ) is a sum of products, each containing at least one entry fromC12(N)(C22(N)−λI)−1C21(N). Again,

from (176) we see that for all|λ| ≥ 1/2, g(λ) can be made smaller than any negative power oflog N by making

p sufficiently large. Choosep so that|g(λ)| < (K log N)−4k1 for theseλ. It is clear that anyp > 8k1 + 4 will

36

suffice. Letλ1, . . . , λk1 denote the eigenvalues ofC11. Sinceρ(C11) < 1, we see that for|λ| ≥ (K log N)−4, we

have

| det(C11(N) − λI)| = |k1∏

i=1

(λi − λ)| (186)

> (K log N)−4k1 (187)

Thus withf(λ) = det(C11(N)− λI), a polynomial, andg(λ) being a rational function, we have the conditions of

Lemma 14 being met on any rectangleC, with vertical lines going through((K log N)−4, 0) and(K1(log N)4, 0).

Therefore, sincef(λ) has no zeros insideC, neither doesdet(C(N) − λI). Thus we get (179).

As before we see that

|aij(N)| ≤ c1/2ij (N)c0

ij1/2

(N) (188)

Therefore, from (178), (179), and Lemmas 10 and 11, we have for all N large

ρ(A(N)) ≤(

K2 log8 N − 1

K2 log8 N

) 12

(189)

For theseN we have thenI − A(N) invertible, and so

eN − e 0N = (I − A(N))−1we (190)

By (72) and (80) we have the entries ofA(N) bounded byK1 log4 N . Notice also, from (189)

| det(I − A(N))| ≥ (1 − ρ(A(N))K ≥

K2 log8 N

(1 +

K2 log8 N − 1

K2 log8 N

) 12

−K

≥ (2K2 log8 N)−K (191)

When considering the inverse of a square matrix in terms of its adjoint divided by its determinant, we see that

the entries of(I − A(N))−1 are bounded by

(K − 1)!K1(log N)4(K−1)

| det(I − A(N))| ≤ K3(log N)12K−4 (192)

Therefore, sincep ≥ 12K−7 (> 8k1 +4), (165) follows on this realization, an event which occurs with probability

one.

Letting m0N = 1

N trD0−1, we have

mN − m0N = ~γT(eN − e0

N) (193)

where~γ = (γ1, . . . , γK)T with

γj =

∫cNτ2

(1 + cNτeN,j)(1 + cnτe0N,j)

dFTN (τ)tr D−1RjD

0−1

N(194)

From (72) and (80) we get each|γj | ≤ cN |z|2v−4 log3 N . Therefore from (165) and the fact thatwmN → 0, we

have

mN − m0N → 0 (195)

almost surely, asN → ∞.

This completes the proof.

37

APPENDIX B

PROOF OFTHEOREM 2

We first prove thatV(0)(x) as defined in Equation (22) verifies

V(0)(x) =

∫ ∞

x

(1

w− m

(0)N (−w)

)dw (196)

and then we prove that, under the conditions of Theorem 2,V(0)(x) defined as such verifies

V(0)(x) − V(x)

a.s.−→ 0 (197)

A. Proof of (196)

First, observe that we can rewriteei(z) under the symmetric form,

ei(z) =1

NtrRi

(−z

[IN +

K∑

k=1

δkRk

])−1

(198)

δi(z) =1

nitrTi (−z [Ini

+ ciei(z)Ti])−1 (199)

and then form(0)N (z),

m(0)N (z) =

1

Ntr

(−z

[IN +

K∑

k=1

δkRk

])−1

(200)

Now, notice that

1

z− m

(0)N (−z) =

1

N

(zI)

−1 −(

z

[IN +

K∑

k=1

δkRk

])−1 (201)

=

K∑

k=1

δk(−z) · ek(−z) (202)

Since the Shannon transformV(x) satisfiesV(x) =∫ +∞

x[w−1 − mN (−w)]dw, we need to find an integral form

for∑K

k=1 δk(−z) · ek(−z). Notice now that

d

dz

1

Nlog det

(IN +

K∑

k=1

δk(−z)Rk

)= −z

K∑

k=1

ek(−z) · δ′k(−z)

d

dz

1

Nlog det (Ink

+ ckek(−z)Tk) = −z · e′k(−z) · δk(−z) (203)

d

dz

(z

K∑

k=1

δk(−z)ek(−z)

)=

K∑

k=1

δk(−z)ek(−z) − z

K∑

k=1

δ′k(−z) · ek(−z) + δk(−z) · e′k(−z) (204)

Combining the last three lines, we have

K∑

k=1

δk(−z)ek(−z) =

d

dz

[− 1

Nlog det

(IN +

K∑

k=1

δk(−z)Rk

)−

K∑

k=1

1

Nlog det (Ink

+ ckek(−z)Tk) + z

K∑

k=1

δk(−z)ek(−z)

](205)

38

which after integration leads to∫ +∞

z

(1

w− m

(0)N (−w)

)dw =

1

Nlog det

(IN +

K∑

k=1

δk(−z)Rk

)+

K∑

k=1

1

Nlog det (Ink

+ ckek(−z)Tk) − z

K∑

k=1

δk(−z)ek(−z) (206)

which is exactly the right-hand side of (22).

B. Proof of (197)

Consider now the existence of a nonrandomα and for eachN a non-negative integerrN for which

maxi≤K

max(λTi

rN +1, λRi

rN+1) ≤ α (207)

(eigenvalues also arranged in non-increasing order). Thenfor eachi

λR

12i

XiTiXH

i R12i

2rN +1 = (sR

12i

XiT12i

2rN+1 )2 (208)

≤ α2‖XiXH

i ‖ (209)

and then we have, from Lemma 15,

λBN

2KrN +1 ≤ α2(‖X1XH

1 ‖ + · · · + ‖XKXH

K‖) (210)

We can in fact consider that the spectral norms of theXi are bounded in the limit. Either Gaussian assumptions

on the components, or finite fourth moment, but all coming from doubly infinite arrays (remember though that we

need the right-unitary invariance structure ofXi). Because of assumption 5 in Corollary 1, we can, by enlarging

the sample space, assume eachXi is embedded in anN × n′i matrix X′

i, whereN/n′i → a as N → ∞. Then,

with probability one (see e.g. [33]),

lim supN

λBN

2KrN+1 ≤ lim supN

α2(‖X′1X

′1H‖ + · · · + ‖X′

KX′K

H‖)

≤ α2K(b/a)(1 +√

a)2 (211)

Let a0 be any real greater thanα2K(b/a)(1 +√

a)2.

SinceS = 0 here, it follows as in [4] that{FBn} is almost surely tight. LetF 0N denote the distribution function

having Stieltjes transformm(0)N , and letf on [0,∞) be a continuous function. Then the function

fa0(x) =

f(x) , x ≤ a0

f(a0) , x > a0(212)

is bounded and continuous. Therefore, with probability1,∫

fa0(x)dFBN (x) −∫

fa0(x)dF 0N (x) → 0 (213)

asN → ∞.

39

Suppose nowrN = o(N). Then, since almost surely there are at most2KrN eigenvalues greater thana0 for all

N large, any converging subsequence of{F 0N} must have some mass lying on[0, a0]. This implies, with probability

1,1

N

λi≤a0

f(λi) −∫

[0,a0]

f(x)dF 0N (x) → 0 (214)

asN → ∞.

Let bN be a bound on the spectral norms of theTi andRi. Then

‖Bn‖ ≤ b2N(‖X′

1X′1H‖ + · · · + ‖X′

KX′K

H‖) (215)

Fix a numberβ > K(b/a)(1+√

a)2, and letaN = b2Nβ. Suppose also thatf is increasing and thatf(aN )rN =

o(N). Then ∫f(x)dFBn(x) − 1

N

λi≤a0

f(λi) → 0 (216)

almost surely, asN → ∞. Therefore, with probability1,∫

f(x)dFBN (x) −∫

[0,a0]

f(x)dF 0N (x) → 0 (217)

asN → ∞.

For anyN we consider, forj = 1, 2, . . ., the jN × jN matrix BN,j formed, as before, from block diagonal

matrices andjN × jni matrices of i.i.d. variables. Then with probability1, FBN,j converges weakly toF 0N as

j → ∞. Properties on the eigenvalues ofBN,j will thus yield properties ofF 0N .

By considering the bound on‖Bn,j‖ analogous to (215), we must haveF 0N (aN ) = 1 for all N large.

Similar to (211) we see that, with probability1

lim supj

λBN,j

2KjrN +1 ≤ a2((1 +√

c1)2 + · · · + (1 +

√cK)2) (218)

this latter number being less thana0 for all N large.

At this point we will use the fact that for probability measuresPN , P on R with PN converging weakly toP ,

we have (see e.g. [34])

lim infN

PN (G) ≥ P (G) (219)

for any open setG. Thus, withG = (a0,∞) we see that, with probability1, for all N large

F 0N ((a0,∞)) = 1 − F 0

N (a0) ≤ lim infj

FBN,j ((a0,∞)) (220)

≤ 2KrN/N (221)

Therefore, for allN large ∫

(a0,∞)

f(x)dF 0N (x) ≤ f(aN)2KrN/N → 0 (222)

asN → ∞.

40

Therefore, we conclude that,∫

f(x)dF 0N (x) is bounded, and with probability1

∫f(x)dFBN (x) −

∫f(x)dF 0

N (x) → 0 (223)

asN → ∞. This concludes the proof.

APPENDIX C

PROOF OFPROPOSITION1

The proof stems from the following result,

Proposition 3: f(P1, . . . ,PK) is a strictly concave matrix in the Hermitian nonnegative definite matricesP1, . . . ,PK ,

if and only if, for any couples(P1a,P1b

), . . . , (PKa,PKb

) of Hermitian nonnegative definite matrices, the function

φ(λ) = f (λP1a+ (1 − λ)P1b

, . . . , λPKa+ (1 − λ)PKb

) (224)

is strictly concave.

Let us use a similar notation as in (31) of the capacity,

I(λ) = I(λP1a+ (1 − λ)P1b

, . . . , λP|S|a + (1 − λ)P|S|b) (225)

and consider a set(δk, ek,P1, . . . ,P|S|) which satisfies the system of equations (31)-(33). Then, from remark (36)

and (37),

dI

dλ=∑

k∈S

∂V

∂δk

∂δk

∂λ+

∂V

∂ek

∂ek

∂λ+

∂V

∂λ(226)

=∂V

∂λ(227)

where

V : (δ1, . . . , δ|S|, e1, . . . , e|S|, λ) 7→ I(λ) (228)

Mere derivations ofV lead then to

∂2V

∂λ2= −

i∈S

(c2i e

2i )

1

Ntr (I + cieiRiPi)

−2(Ri(Pia

− Pib))2 (229)

Since ei > 0 on the strictly negative real axis, if any of theRi’s is positive definite, then, for all nonnegative

definite couples(Pia,Pib

), such thatPia6= Pib

, I ′′ < 0. Then, from Proposition 3, the deterministic approximate

on the right-hand side of (27) is strictly concave inP1, . . . ,P|S| if any of theRi matrices is invertible.

APPENDIX D

USEFUL LEMMAS

In this section, we gather most of the known or new lemmas which are needed in various places in Proof A.

The statements in the following Lemma are well-known

Lemma 1: 1) For rectangular matricesA, B of the same size,

rank(A + B) ≤ rank(A) + rank(B) (230)

41

2) For rectangular matricesA, B for which AB is defined,

rank(AB) ≤ min(rank(A), rank(B)) (231)

3) For rectangularA, rank(A) is less than the number of non-zero entries ofA.

Lemma 2: (Lemma 2.4 of [4]) ForN × N Hermitian matricesA andB,

‖FA − FB‖ ≤ 1

Nrank(A − B) (232)

From these two lemmas we get the following.

Lemma 3:Let S, A, A, be HermitianN × N , Q, Q both N × n, andB, B both Hermitiann × n. Then

1)

‖FS+AQBQHA − FS+AQBQHA‖ ≤ 2

Nrank(Q− Q) (233)

2)

‖FS+AQBQHA − FS+AQBQHA‖ ≤ 2

Nrank(A − A) (234)

and

3)

‖FS+AQBQHA − FS+AQBQHA‖ ≤ 1

Nrank(B− B) (235)

Lemma 4:For N × N A, τ ∈ C andr ∈ CN for which A andA + τrrH are invertible,

rH(A + τrrH)−1 =1

1 + τrHA−1rrHA−1 (236)

This result follows fromrHA−1(A + τrrH) = (1 + τrHA−1r)rH.

Moreover, we recall Lemma 2.6 of [4]

Lemma 5:Let z ∈ C+ with v = ℑ[z], A andB N × N with B Hermitian, andr ∈ C

N . Then

∣∣tr((B − zIN)−1 − (B + rrH − zIN )−1

)A∣∣ =

∣∣∣∣rH(B − zIN )−1A(B − zIN)−1r

1 + rH(B − zIN)−1r

∣∣∣∣ ≤‖A‖

v. (237)

From Lemma 2.2 of [17], and Theorems A.2, A.4, A.5 of [18], we have the following

Lemma 6: If f is analytic onC+, both f(z) and zf(z) map C+ into C+, and there exists aθ ∈ (0, π/2) for

which zf(z) → c, finite, asz → ∞ restricted to{w ∈ C : θ < argw < π − θ}, thenc < 0 andf is the Stieltjes

transform of a measure on the nonnegative reals with total mass−c.

Also, from [4], we need

Lemma 7:Let y = (y1, . . . , yN)T with the yi’s i.i.d. such thatEy1 = 0, E|y1|2 = 1 andy1 ≤ log N , andA an

N × N matrix, then

E|yHAy|6 ≤ K‖A‖6N3 log12 N (238)

whereK does not depend onN , A, nor on the distribution ofy1.

Additionally, we need

42

Lemma 8:Let D = A + iB + ivI, whereA, B are N × N Hermitian,B is also positive semi-definite, and

v > 0. Then‖D−1‖ ≤ v−1.

Proof: We haveDDH = (A + iB)(A − iB) + v2I + 2vB. Therefore the eigenvalues ofDDH are greater or

equal tov2, which implies the singular values ofD are greater or equal tov, so that the singular values ofD−1

are less or equal tov−1. We therefore get our result.

From Theorem 2.1 of [28],

Lemma 9:Let ρ(C) denote the spectral radius of theN ×N matrix C (the largest of the absolute values of the

eigenvalues ofC). If x,b ∈N with the components ofC, x, andb all positive, then the equationx = Cx + b

implies ρ(C) < 1.

From Theorem 8.1.18 of [29],

Lemma 10:SupposeA = (aij) andB = (bij) areN × N with bij nonnegative and|aij | ≤ bij . Then

ρ(A) ≤ ρ((|aij |)) ≤ ρ(B) (239)

Also, from Lemma 5.7.9 of [30],

Lemma 11:Let A = (aij) andB = (bij) be N × N with aij , bij nonnegative. Then

ρ((a12

ijb12

ij)) ≤ (ρ(A))12 (ρ(B))

12 (240)

And, Theorems 8.2.2 and 8.3.1 of [29],

Lemma 12:If C is a square matrix with nonnegative entries, thenρ(C) is an eigenvalue ofC having an

eigenvectorx with nonnegative entries. Moreover, if the entries ofC are all positive, thenρ(C) > 0 and the entries

of x are all positive.

From [30], we also need Theorem 6.1.1,

Lemma 13: Gervsgorin’s TheoremAll the eigenvalues of anN × N matrix A = (aij) lie in the union of the

N disks in the complex plane, theith disk having centeraii and radius∑

j 6=i |aij |.Theorem 3.42 of [16],

Lemma 14: Rouche’s TheoremIf f(z) andg(z) are analytic inside and on a closed contourC of the complex

plane, and|g(z)| < |f(z)| on C, thenf(z) andf(z) + g(z) have the same number of zeros insideC.

In order to prove Theorem 2, we also need, from [32]

Lemma 15:Consider a rectangular matrixA and letsAi denote theith largest singular value ofA, with sA

i = 0

wheneveri > rank(A). Let m, n be arbitrary non-negative integers. Then forA, B rectangular of the same size

sA+Bm+n+1 ≤ sA

m+1 + sBn+1 (241)

And for A, B rectangular for whichAB is defined

sABm+n+1 ≤ sA

m+1sBn+1 (242)

As a corollary, for any integerr ≥ 0 and rectangular matricesA1, . . . ,AK , all of the same size,

sA1+···+AK

Kr+1 ≤ sA1r+1 + · · · + sAK

r+1 (243)

43

ACKNOWLEDGMENT

Silverstein’s work is supported by the U.S. Army Research Office under Grant W911NF-05-1-0244. Debbah’s

work is partially supported by the European Commission in the framework of the FP7 Network of Excellence in

Wireless Communications NEWCOM++.

The authors would like to thank Walid Hachem for the fruitfuldiscussions we shared concerning some results

of this paper.

REFERENCES

[1] S. Sesia, I. Toufik and M. Baker, “LTE: The UMTS Long Term Evolution, From Theory to Practice,” Wiley and Sons, 2009.

[2] A. M. Tulino, A. Lozano and S. Verdu, “Impact of antenna correlation on the capacity of multiantenna channels,” IEEE Trans. on

Information Theory, vol. 51, no. 7, pp. 2491-2509, 2005.

[3] M. J. M. Peacock, I. B. Collings and M. L. Honig, “Eigenvalue distributions of sums and products of large random matrices via incremental

matrix expansions,” IEEE Trans. on Information Theory, vol. 54, no. 5, pp. 2123-2138, 2008.

[4] J. Silverstein and Z. Bai, “On the empirical distribution of eigenvalues of a class of large dimensional random matrices,” Journal of

Multivariate Analysis, vol. 54, issue 2, pp. 175–192, 1995

[5] G. Taricco, E. Riegler, “On the Ergodic Capacity of the Asymptotic Separately-Correlated Rician Fading MIMO Channel with Interference,”

IEEE International Symposium on Information Theory, pp. 531-535, 2007

[6] G. Foschini and M. Gans, “On Limits of Wireless Communications in a Fading Environment when Using Multiple Antennas,” Wireless

Personal Communications, vol. 6, no. 3, pp. 311-335, 1998.

[7] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European transactions on telecommunications, vol. 10, no. 6, pp. 585-595,

1999.

[8] A. Paulraj, R. Nabar and D. Gore, “Introduction to Space-Time Wireless Communications,” Cambridge University Press, 2003.

[9] C. Chuah, D. Tse, J. Kahn, and R. Valenzuela, “Capacity Scaling in MIMO Wireless Systems under Correlated Fading,” IEEE Trans. on

Information Theory, pp. 637650, 2002.

[10] A. Moustakas, S. Simon and A. Sengupta, “MIMO Capacity Through Correlated Channels in the Presence of Correlated Interferers and

Noise: A (Not So) LargeN Analysis,” IEEE Trans. on Information Theory, vol. 49, no. 10, 2003.

[11] C. K. Wen, Y. N. Lee, J. T. Chen, and P. Ting, “Asymptotic spectral efficiency of MIMO multiple-access wireless systems exploring only

channel spatial correlations,” IEEE Transactions on Signal Processing, vol. 53, no. 6, pp. 2059-2073, 2005.

[12] D. Guo and S. Verdu, “Multiuser Detection and Statistical Physics,” Communications on Information and Network Security, Kluwer

Academic Publishers, 2003.

[13] S. Shamai and A. Wyner, “Information-Theoretic Considerations for Symmetric, Cellular, Multiple-Access FadingChannels - Part I,” IEEE

Trans. on Information Theory, vol. 43, no. 6, 1997.

[14] B. Zaidel, S. Shamai and S. Verdu, “Multicell Uplink Spectral Efficiency of Coded DS-CDMA With Random Signatures,”IEEE Journal

on Selected Areas in Communications, vol. 19, no. 8, 2001.

[15] Z. Bai and J. Silverstein, “On the signal-to-interference-ratio of CDMA systems in wireless communications,” Annals of Applied Probability

vol. 17 no. 1, pp. 81-101, 2007.

[16] E. C. Titchmarsh, “The Theory of Functions,” Oxford University Press, 1939.

[17] J. A. Shohat and J. D. Tamarkin, “The Problem of Moments,” American Mathematical Society, Providence, RI, 1970.

[18] M. K. Krein and A. A. Nudelman, “The Markov Moment Problem and Extremal Problems,” American Mathematical Society, Providence,

RI, 1997.

[19] R. B. Dozier and J. W. Silverstein, “On the Empirical Distribution of Eigenvalues of Large Dimensional Information-Plus-Noise Type

Matrices,” Journal of Multivariate Analysis 98(4), pp. 678-694, 2007.

[20] S. Vishwanath, N. Jindal and A. Goldsmith, “Duality, Achievable Rates, and Sum-Rate Capacity of Gaussian MIMO Broadcast Channels,”

IEEE Trans. on Information Theory, vol. 49, no. 10, 2003.

44

[21] P. Viswanath and D. Tse, “Sum Capacity of the Multiple Antenna Gaussian Broadcast Channel and Uplink-Downlink Duality,” IEEE

Trans. on Information Theory, vol. 49, no. 8, pp. 19121923, 2003.

[22] S. Verdu, “Multiple-access channels with memory withand without frame synchronism,” IEEE Trans. on InformationTheory, vol. 35, pp.

605-619, 1989.

[23] H. Weingarten, Y. Steinberg and S. Shamai, “The Capacity Region of the Gaussian Multiple-Input Multiple-Output Broadcast Channel,”

IEEE Trans. on Information Theory, vol. 52, no. 9, pp. 3936-3964, 2006.

[24] A. Soysal and S. Ulukus “Optimality of Beamforming in Fading MIMO Multiple Access Channels,”to be published.

[25] A.M. Tulino, A. Lozano and S. Verdu, “Impact of antenna correlation on the capacity of multiantenna channels,” IEEETrans. on Information

Theory, vol. 51, no. 7, pp. 2491-2509, 2005.

[26] W. Hachem, Ph. Loubaton and J. Najim, “Deterministic Equivalents for Certain Functionals of Large Ranfom Matrices,” The Annals of

Applied Probability, Vol. 17, No. 3, 875930, 2007.

[27] J. Dumont, S. Lasaulce, W. Hachem, Ph. Loubaton and J. Najim, “On the Capacity Achieving Covariance Matrix for Rician MIMO

Channels: An Asymptotic Approach”, submitted to IEEE Trans. on Information Theory, Octggober 2007, revised in October2008.

[28] E. Seneta, “Non-negative Matrices and Markov Chains,”Second Edition, Springer Verlag New York, 1981.

[29] R.A. Horn and C.R. Johnson, “Matrix Analysis,” Cambridge University Press, 1985.

[30] R.A. Horn and C.R. Johnson, “Topics in Matrix Analysis,” Cambridge University Press, 1991.

[31] Z. D. Bai and Y. Q. Yin, “Limit of the smallest eigenvalueof a large dimensional sample covariance matrix,” The annals of Probability,

pp. 1275-1294, Institute of Mathematical Statistics, 1993.

[32] K. Fan, “Maximum properties and inequalities for the eigenvalues of completely continuous operators,” Proc. Nat.Acad. Sci. U.S.A. 37

760-766, 1951.

[33] Z. Bai and J. W. Silverstein, “Spectral Analysis of Large Dimensional Random Matrices,” Science Press, Beijing, 2006.

[34] P. Billingsley, “Convergence of Probability Measures,” John Wiley & Sons, Second Revised Edition, 1999.

[35] M. Debbah and R. R. Muller, “MIMO channel modeling and the principle of maximum entropy,” IEEE Transactions on Information

Theory, vol. 51, no. 5, pp. 1667-1690, 2005.