1 a deterministic equivalent for the capacity analysis …jack/multicell_journal716.pdf3 rates as a...
TRANSCRIPT
1
A deterministic equivalent for the capacity
analysis of correlated multi-user MIMO
channelsRomain Couillet†,‡, Merouane Debbah‡, Jack Silverstein⊥
Abstract
This paper provides the analysis of capacity expressions for multi-user and multi-cell wireless communication
schemes when the transmitters and the receivers have a largenumber of correlated antennas. Our main contribution
mathematically translates into a deterministic equivalent of the Shannon transform of a class of large dimensional
random matrices; the latter are sums of Gram matrices with separable variance profiles, the covariance matrices of
which possibly have asymptotically large eigenvalues. On the applicative side, this class of large matrices is used in
this contribution to model (i) multi-antenna multiple access (MAC) and broadcast channels (BC) with transmit and
receive channel correlation, (ii) multiple-input multiple-output (MIMO) communications with inter-cell interference
and channel correlation both at the base stations and at the receivers. The theoretical capacity formulas obtained for
these models extend the classical results on multi-user MIMO capacities in independent and identically distributed
(i.i.d.) Gaussian channels to the more realistic Gaussian channels with separable variance profile. On an information
theoretical viewpoint, this article provides: in scenario(i), an asymptotic description of the MAC and BC rate regionsas
a function only of the transmit and receive correlation matrices (thus independently of the random channel realizations);
an asymptotic expression of the capacity-maximizing covariance matrices in the uplink MAC and downlink BC; a
water-filling algorithm which, upon convergence, is provedto converge to the capacity-achieving power allocation at
the transmitters both in MAC and BC. In scenario (ii), the article provides: an expression of the single-user decoding
capacity and minimum mean square error (MMSE) decoding capacity when interference is treated as Gaussian noise
with a known variance profile; for single-user decoding, thecapacity-achieving antenna power allocation policy at
the transmitter.
I. I NTRODUCTION
When mobile networks were some time ago expected to run out ofpower and frequency resources while being
simultaneously subject to an increasing demand of achievable data rates, Foschini [6] and Telatar [7] introduced
the notion of multiple input multiple output (MIMO) systemsand predicted a growth of capacity performance of
min(N, n) for a communication between ann-antenna transmitter and anN -antenna receiver when the propagation
†ST-Ericsson, 505 Route des Lucioles, 06560 Sophia Antipolis‡Alcatel-Lucent Chair on Flexible Radio, Supelec, 3 rue Joliot-Curie, 91192 Gif sur Yvette⊥Department of Mathematics, North Carolina State University, Raleigh, North Carolina 27695-8205
2
channel matrix model is formed of independent and identically (i.i.d.) Gaussian entries. However, in practical
systems, this tremendous multiplexing gain can only be provided for large signal-to-interference plus noise ratios
(SINR) and for uncorrelated transmit and receive antenna arrays at both communication sides. In present wireless
mobile networks, the scarcity of available frequency resources has led to a widespread incentive for MIMO
communications. Due to space limitations, mobile designers now embed more and more antennas in small devices,
which inevitably spawns non-negligible correlation patterns at the antenna arrays and thus non-negligible effects on
the achievable transmission rates. Since MIMO systems comealong with a tremendous increase in signal processing
requirements (which naturally imply a large non-linear increase in power consumption), both base station and mobile
manufacturers need to accurately assess the exact cost of bit rate increases on finite size devices, prone to experience
strong transmit and receive correlation. Our intention is to evaluate the antenna efficiency, i.e. the mean per-antenna
achievable rate, for different communication models detailed in the following.
Multi-cell and multi-user systems are among the scenarios of main interest to cellular service providers. The scope
of the present study (which, we will see, can be extended to a larger set of mobile communications scenarios) lies
in the following two wireless communication systems, whichthe authors consider of most valuable interest,
1) the multiple access channels (MAC) in whichK transmitters, hereafter assimilated to mobile terminal users,
transmit information to a unique receiver, hereafter referred to as the base station; and the dual broadcast
channels (BC) in which the base station multi-casts information to theK users. While the major scientific
breakthroughs in multi-antenna broadcast channels are quite recent, see e.g. [20], the practical applications
are foreseen to arise in a near future with the so-called multi-user MIMO techniques to be used in future
long term evolution standards, e.g. [1].
2) the single-user decoding and minimum mean square error (MMSE) decoding [8] in multi-cell scenarios. In
most current mobile communication systems, the wireless networks are composed of multiple overlapping
cells, controlled by non-cooperating base stations. In these conditions, the achievable rates for every user in
a cell, assuming no intra-cell interference, corresponds to the capacity of the single-user decoding scheme in
which interfering signals are treated as Gaussian noise with a known correlation pattern. However, single-user
decoders are not linear decoders and are often replaced in practical applications by the simpler linear MMSE
decoders; these decoders maximize the signal-to-interference plus noise ratio (SINR) at the receiver.
The achievable rate region of MIMO MAC and BC channels for generic channels have been known since the
successive contributions [20]-[21], who established an important duality link between MAC rate regions and BC
rate regions in vector channels, both in single antenna and MIMO channels. These important contributions provide
the general descriptions of the rate regions under fixed wireless channels, without any underlying transmission
model. As a consequence, the exact expressions obtained areoften intractable and generally do not provide any
insight of the various channel parameters involved on the resulting achievable data rate. The mathematical field of
large random matrices allows one to circumvent this issue byproviding closed-form approximations of achievable
3
rates as a function of the most relevant channel parameters only1. The most notable result in the line of the present
study for random channels is due to Tulino [2], which provides an asymptotic capacity expression of point-to-point
MIMO systems when the channel random matrix is composed of i.i.d. Gaussian entries. This result naturally extends
to multiple users by dividing the MIMO channel matrix intoK sub-matrices; this therefore allows one to obtain
a description of the MAC and BC rate regions for the uncorrelated Gaussian channels, which in shown in [2] to
depend only (i) on the signal to noise ratio (SNR) at the receivers and (ii) on the ratioc = N/n between the
numberN of transmit antennas at the base station and the numbern of receive antennas at all receivers. Tulino
also provides an expression of the capacity-achieving power allocation policy at the base station. In [24], Ulukus
derives the capacity-achieving power allocation policy for a finite number of antennas at all transmit/receive devices
in the case ofK users whose channelsHk, 1 ≤ k ≤ K, are modelled as Kronecker channels, i.e. Gaussian with
separable variance profile,Hk = R12
k XkT12
k , whereRk, Tk are Hermitian nonnegative definite2 andXk is random
with i.i.d. Gaussian entries; however, Ulukus (i) does not provide a theoretical large dimensional expression of the
resulting capacity, (ii) makes the strong assumption that all Rk matrices are equal. It is in fact rather straightforward
to observe that, under the assumption that all allRk share the same eigenspace, the result from [2] is extensible
to the Kronecker channel model, see e.g. [3]. When theRk matrices have no trivial relationship, the problem
is more complex and requires different mathematical tools.Those tools allow us in the present work to obtain a
deterministic equivalent of the per-antenna rate regions associated to the channelsHk: this is our main result, which
we provide in Theorem 2. In fact, it is important to note that the final formula of Theorem 2 is already found and
used by Chen, Equation (32) in [11]; however, the latter is provided without any proof, nor any hypotheses on the
considered matrices, and merely stems from the previous Equation (6) which is in valid when allRk matrices have
the same eigenspace, but which is incorrect in the general case, as will become clear in the following. Chen also
provides the iterative water-filling algorithm which we will also obtain in the course of this paper (see Table I);
however, the convergence of this algorithm to the correct capacity, which we will prove, is not provided in [11].
Regarding multi-cell networks, to the authors’ knowledge,few contributions treat simultaneously the problem of
multi-cell interference in more structured channel modelsthan i.i.d. Gaussian channels. In [13], the authors carry out
the performance analysis of TDMA-based networks with inter-cell interference. In [14], a random matrix approach
is used to study large CDMA-based networks with inter-cell interference. In our particular MIMO context, it is
important to mention the work of Moustakas [10] who conjectures an analytic solution to the single-user decoding
problem with channel correlation and interference with known variance profile, using the replica method [12].3
Practical channel models, as recalled earlier, prove more challenging than mere i.i.d. Gaussian channels. Small
distances between antennas embedded in the wireless devices as well as solid angles of transmission and reception
of signal energy tend to correlate (potentially strongly) the emitted and received signals. A largely spread channel
1such as the transmit/receive covariance matrices, the deterministic line of sight components etc.
2R12k
andT12k
are then defined as their unique nonnegative definite square root.
3we use here the term ‘conjecture’ instead of ‘demonstrate’ since the exactness of the replica method has not yet been mathematically proven.
4
model, which naturally unfolds from the knowledge of a correlation pattern at the transmitter and at the receiver
[35] is the so-called Kronecker model, which mathematically translates into random matrices with i.i.d. Gaussian
entries with separable correlation profile, as previously recalled. This model is of particular interest when no line
of sight component is present in the channel and when a sufficiently large number of scatterers is found in the
communication medium. This is the model we will consider in the following study. However note that, in their
substantial contributions [26]-[27], Loubaton et al. provide a deterministic equivalent of capacity of a point-to-point
MIMO channel [26] and its corresponding capacity-achieving input covariance matrix [27] when the channel matrix
H is modelled as Ricean, i.e.H = A + X whereA is some deterministic matrix, standing for the line-of-sight
component, andX is Gaussian with general variance profile (E[|Xij|2] = σij ). Of particular practical interest is
also the theoretical work of Tse [9] on MIMO point-to-point capacity in both uncorrelated and correlated channels,
which are validated by ray-tracing simulations.
The main contribution of this paper summarizes into two major mathematical theorems, contributing to the field
of random matrix theory, and an important water-filling algorithm enabling to describe the boundaries of the MAC
and BC rate regions. On a purely mathematical viewpoint, we provide a deterministic equivalent of the Stieltjes
transform of the sumBN of K Gaussian matrices with separable variance profileR12
k XkTkXH
kR12
k , Rk, Tk
nonnegative Hermitian,Xk i.i.d. and unitary invariant, plus a deterministic nonnegative Hermitian matrixS,
BN =
K∑
k=1
R12
k XkTkXH
kR12
k + S (1)
This result extends the recent results from Taricco [5] and Honig [3] in which, respectively, allRk matrices are
equal or share the same eigenspace. The main consequence of letting theRk matrices unconstrained is that no
limit (for large N ) eigenvalue distribution ofBN is available, even when theRk and Tk matrices have a limit
eigenvalue distribution. Therefore, classical results from random matrix theory are not usable here. The second major
result lies in a deterministic equivalent of the Shannon transform ofBN , obtained by integration of the deterministic
equivalent of the Stieltjes transform. This result extends[2] to multiple channels with separable variance profile. For
both deterministic equivalents of the Stieltjes and Shannon transforms, and contrary to most previous contributions
in similar articles, we do not restrict here the eigenvaluesof the Rk andTk matrices to be uniformly (over the
matrix sizes) bounded. In the present contribution, those eigenvalues can theoretically grow at a restricted rate, to
be defined later. In practice, this allows to study a wider scope of channel models than in the previously mentioned
studies; in particular, this study covers the usual case when theRk andTk matrices are constrained bytr(Rk) = N ,
tr(Tk) = nk (N andnk the respective sizes ofRk andTk).
The remainder of this paper is structured as follows: in Section II, we provide a quick summary of our results
and discuss the extent of their applicability. In Section III, we provide the two main mathematical theorems needed
for practical applications in the subsequent sections. Thecomplete proofs of both theorems are provided in the
appendix. In Section IV, the rate region of MAC and BC channels and the capacity of single-user decoding and
MMSE decoding with inter-cell interference are studied. Inthis section, we will introduce our third main result:
an iterative water-filling algorithm to describe the boundary of the MAC and BC rate regions. In Section V, we
5
provide simulation results of the previously derived theoretical formulas and discuss the advantages and limitations
of the deterministic equivalents. Finally, in Section VI, we give our conclusions.
Notation: In the following, boldface lower-case symbols represent vectors, capital boldface characters denote
matrices (IN is theN ×N identity matrix).Xij denotes the(i, j) entry ofX. The Hermitian transpose is denoted
(·)H. The operatorstrX, |X| and‖X‖ represent the trace, determinant and spectral norm of matrix X, respectively.
The symbolE[·] denotes expectation. The notationFY stands for the empirical distribution of the eigenvalues of
the Hermitian matrixY. The function(x)+ equalsmax(x, 0) for real x. For F , G two distribution functions, we
denoteF ⇒ G the vague convergence ofF to G.
II. SCOPE ANDSUMMARY OF MAIN RESULTS
In this section, we summarize the main results of this paper and explain how they naturally help to study, in the
present multi-cell multi-user framework, the effects of channel correlation on the antennarate efficiency, which we
define as the mean achievable rate provided by each transmit/receive antenna.
A. General Model
Consider a set ofK wireless entities which purpose is to communicate with another wireless device, either
in downlink or in uplink, at the ‘best’ possible rate performance4. For clarity and without generality restriction,
consider here the uplink scenario. DenoteHk the channel matrix model between thekth transmitterk and the
receiver. Then, assuming a large number of scatterers in themedium between both entities and no line-of-sight
vision, it is often accurate to consider thatHk is modelled as Kronecker,
Hk = R12
k XkT12
k (2)
where, as already mentioned, theN×N matrixR12
k and thenk×nk matrixT12
k are respectively the only nonnegative
square roots of the Hermitian nonnegative matricesRk and Tk, andXk is a realization of a random Gaussian
matrix5. The matricesTk andRk in this scenario model the correlation present in signals transmitted by userk
and signals received at the receiver. It is important to stress out that those correlation patterns emerge both from
the one-to-one antenna spacings on the volume limited devices and from the solid angles of useful transmitted
and received energy. Without this second factor, it would make sense that allRk are equals, which was claimed
for instance in [24]. However, this would mean that signals are received isotropically at the receiver, which turns
out to be often too strong an assumption to characterize practical communication channels. This being said, the
only sensible restriction that one can make on the matricesRk andTk is for their diagonal entries to be less or
equal to one, and then for their spectral norms to be less thanN andnk respectively. We will see that our results
require different hypothesis, which nonetheless allow to treat some channel models with strong correlations. The
4we purposely are cryptic on the term ‘best’ for one might consider very different performance criteria, such as achievable sum-rate, max-min
rate etc.
5Gaussian matrices often refer to random matrices with i.i.d. Gaussian entries.
6
assumption often considered, e.g. in [26]-[27], is that thespectral norms ofRk andTk, for all k, are uniformly
bounded (over the matrix sizes), which means that only low correlation patterns can be studied; in the present work,
we will allow a possibly large portion of the eigenvalues ofRk, Tk to grow at a potentially high rate, along with
growing N , nk. More precisely, we provide a sufficient (but not necessary)condition on the number and growing
rate of the eigenvalues ofRk andTk which ensures the asymptotic accuracy of the deterministicequivalents under
study. Under this condition, a wider scope of channel correlation models can be studied than in most previous
contributions, including the classical case whentr(Rk) = N , tr(Tk) = nk.
As will be evidenced by the information theoretic applications of Sections IV-A and IV-B, most multi-cell or
multi-user capacity performance rely more or less directlyon the so-called Stieltjes transformmN (z) of matrices
BS
N of the type
BS
N =∑
k∈S
R12
k XkTkXH
kR12
k (3)
for some given subsetS of {1, . . . , K}.
The Stieltjes transformmN of the N × N Hermitian matrixBS
N is defined overC \ R+ as
mN (z) =
∫1
λ − zdFBS
N (λ) =1
Ntr(BS
N − zIN
)−1(4)
Of importance is the Shannon transform, denotedV(x), of BS
N , which we define, forx > 0, as
V(x) =1
Nlog det
(IN +
1
xBS
N
)(5)
=
∫ +∞
0
log(1 + λx−1)dFBS
N (λ) (6)
=
∫ +∞
x
(1
w− mN (−w)
)dw (7)
B. Main results
The main results of this work related to random matrix theorycome as follows,
• we first present a theorem, namely Theorem 1, which provides adeterministic equivalent of the Stieltjes
transform of generalizedBS
N -like matrices, under the assumption that, with growingN and nk (with non
trivial ratio 0 < c = N/n ≪ N ) the sequences{FTk}nkand {FRk}N are tight. This is, we provide an
approximation ofmN(z) which does not depend on the realization of theXk matrices. The tight sequence
assumption allows some degenerated cases of very strong correlation inRk andTk, which exhibit an eigenvalue
of order of magnitudeN .6
• we then provide in Theorem 2 a deterministic equivalentV(0)(x) of the Shannon transformV(x) of BS
N . For
this theorem, the assumptions on theRk and Tk matrices are more constraining, since we need to assume
here that the maximum numberrN of the eigenvalues ofTk, Rk which grow unbounded and their maximum
6remark, for instance, that the scenario in which the device volume is fixed while the number of antennasN grows inside this volume, is
prone to exhibit such degenerated behaviour. This case can therefore be treated here.
7
valuebN satisfy log(1 + b2Nβ/x)rN = o(N) for some constantβ to be defined in Theorem 2. However, this
assumption is largely less restrictive than the hypothesisof uniformly (with respect toN ) bounded spectral
norms ofTk andRk. For instance, we can theoretically allow
– the matricesTk andRk to verify tr(Rk) = O(N), tr(Tk) = O(N), which is of practical interest here.
– all but a finite number of the eigenvalues ofTk, Rk to be uniformly bounded, the largest of them growing
like eN . For our specific study, this case is less relevant though.
The major practical interest of Theorems 1 and 2 lies in the possibility to analyze capacity expressions, no longer
as stochastic variables depending on the matricesXk but as approximations of deterministic quantities. The study of
those quantities are in general simpler than the study of thestochastic expressions, even if the deterministic results
are actually solutions of involved implicit equations (seeSection III). In particular, remember that our problematic
introduced in Section I is to study the trade-off ‘capacity gain’ versus ‘cost’ of additional transmit/receive antennas.
For this reason, the typical figures of performance sought for are the per-(transmit or receive) antenna normalized
capacity, sum rate or rate region, i.e. the mean efficiency ofan antenna of the transmit/receive array. Those are
again related to the Stieltjes and Shannon transform ofBS
N -like matrices. However, we will not provide in this
study asymptotic sum-capacity expressions, i.e.N times the Shannon transform, for which asymptotic accuracyof
the deterministic equivalents cannot be verified; on this topic, the reader is strongly advised to refer to [27].
In such practical applications as the rate region of BC, the Shannon transform expressions (5) are needed to
calculate the corner points in the rate region; because of MAC-BC duality, to determine the BC rate region, one
needs to treat the dual MAC uplink problem, see Section IV-A;in order to decouple the influence of antenna
correlation and transmit covariance matrices at the transmitters, the correlation matrixRk at thekth BC receiver
(and then thekth MAC transmitter) is replaced by the productR12
k PkR12
k , with Rk translating the channel correlation
at the transmitter andPk the transmit signal covariance matrix. Finding out the matricesPk which maximize (5)
is a very difficult problem for finiteN , which, up to the authors’ knowledge, has not been solved, unless all the
Tk matrices are equal [24]. In Section IV-A, we determine the matrices P⋆k which maximize the deterministic
equivalentV(0) of V in (5). The following points are of noticeable importance,
• the eigenspaces of the capacity-maximizingP⋆k matrices (in the deterministic equivalent) coincide respectively
with the eigenspaces of the correlation matricesRk at the receivers.
• the eigenvalues of theP⋆k matrices are solution of a classical optimization problem.They are given by a
water-filling process.
• we provide an iterative water-filling algorithm for the eigenvalues ofPk which, upon convergence, are proved
to converge to the optimal eigenvalues ofP⋆k, for all k. This is to say, we do not prove that the algorithm
converges (though after intensive simulations, it always converged) but we state and can prove that, if it does,
then it converges towards the sought solution.
It is important to understand here that, contrary to [24], the above result does not imply that the capacity-
maximizing power allocation in the finiteN regime consists in aligning the eigenvectors of thePk matrices to
8
those of the transmit correlation matrices; we show that this strategy does optimize the deterministic equivalent
though.
III. M ATHEMATICAL PRELIMINARIES
In this section, we first introduce Theorem 1, which providesa deterministic equivalent for the Stieltjes transform
of matricesBN of the type (1). The underlying assumptions of this theorem are made as large as possible for
mathematical completeness. The Shannon transform ofBN is then provided in Theorem 2, under tighter assumptions
on the matricesXk, Rk andTk.
Theorem 1:Let K ∈ N∗ be some fixed positive integer. For someN ∈ N∗, let
BN =
K∑
k=1
R12
k XkTkXH
kR12
k + S (8)
be anN × N matrix with the following hypothesis for allk ∈ {1, . . . , K},
1) Xk =(
1√nk
Xkij
)is N × nk with the Xk
ij identically distributed complex for allN , i, j, independent for
each fixedN , andE|Xk11 − EXk
11|2 = 1,
2) R12
k is the N × N Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix
Rk,
3) Tk = diag(τ1, . . . , τnk) is nk × nk, nk ∈ N∗, diagonal withτi ≥ 0,
4) The sequences{FTk}nk≥1 and {FRk}N≥1 are tight, i.e. for allε > 0, there existsM0 > 0 such that
M > M0 implies FTk([M,∞)) < ε andFRk([M,∞)) < ε for all nk, N ,
5) S is N × N Hermitian positive definite,
6) There existsb > a > 0 for which
a ≤ lim infN
ck ≤ lim supN
cN ≤ b (9)
with ck = N/nk.
Also denote, forz ∈ C \ R+, mN (z) =∫
(λ − z)−1dFBN (λ), the Stieltjes transform ofBN . Then, as allN and
nk grow large, with (non necessarily fixed) ratiock,
mN (z) − m(0)N (z)
a.s.−→ 0 (10)
where
m(0)N (z) =
1
Ntr
(S +
K∑
k=1
∫τkdFTk(τk)
1 + ckτkek(z)Rk − zIN
)−1
(11)
and the set of functions{ei(z)}, i ∈ {1, . . . , K}, form the unique solution to theK equations
ei(z) =1
NtrRi
(S +
K∑
k=1
∫τkdFTk(τk)
1 + ckτkek(z)Rk − zIN
)−1
(12)
such thatsgn(ℑ[ei(z)]) = sgn(ℑ[z]).
9
Moreover, for anyε > 0, the convergence of Equation (10) is uniform over any regionof C bounded by a contour
interior to
C \ ({z : |z| ≤ ε} ∪ {z = x + iv : x > 0, |v| ≤ ε})
For allN , the functionm(0)N is the Stieltjes transform of a distributionF 0
N . DenotingFBN the empirical eigenvalue
distribution ofBN , we finally have
FBN − F 0N ⇒ 0 (13)
weakly asN → ∞.
Proof: The proof of Theorem 1 is deferred to Appendix A.
Looser hypothesis will be used in the applications of Theorem 1 provided in Section IV. We will specifically
need the corollary hereafter,
Corollary 1: Let K ∈ N be some positive integer. For someN ∈ N∗, let
BN =
K∑
k=1
R12
k XkTkXH
kR12
k (14)
be anN × N matrix with the following hypothesis for allk ∈ {1, . . . , K},
1) Xk =(
1√nk
Xkij
)is N × nk with joint distribution invariant to right unitary product, where theXk
ij are
independent and identically distributed for eachi, j, N , with finite fourth order moment.
2) R12
k is the N × N Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix
Rk,
3) Tk is annk × nk nonnegative definite Hermitian matrix,
4) The sequences{FTk}nk≥1 and{FRk}N≥1 are tight.
5) There existsb > a > 0 for which
a ≤ lim infN
ck ≤ lim supN
cN ≤ b (15)
with ck = N/nk.
Also denote, forx < 0, mN (x) = 1N (BN − xIN )−1. Then, as allN andnk grow large (whileK is fixed), with
ratio ck
mN(x) − m(0)N (x)
a.s.−→ 0 (16)
where
m(0)N (x) =
1
Ntr
(K∑
k=1
∫τkdFTk(τk)
1 + ckτkek(z)Rk − xIN
)−1
(17)
and the set of functions{ei(z)}, i ∈ {1, . . . , K}, form the unique solution to theK equations
ei(z) =1
NtrRi
(K∑
k=1
∫τkdFTk(τk)
1 + ckτkek(z)Rk − zIN
)−1
(18)
such thatsgn(ℑ[ei(z)]) = sgn(ℑ[z]).
10
Proof: Since theXk ’s are invariant to right-unitary product, the joint distribution of XkU coincides that of
Xk, for U anynk ×nk unitary matrix. Therefore,XkTkXH
k in Theorem 1 can be substituted byXk(UTkUH)XH
k
without compromising the final result. As a consequence, theTk ’s can be taken non diagonal nonnegative definite
Hermitian and the result of Theorem 1 still holds true.
The deterministic equivalent of the Stieltjes transformmN of BN is then extended to a deterministic equivalent
of the Shannon transform ofBN in the following result,
Theorem 2:Let x be some strictly positive real number. LetBN be a random Hermitian matrix as defined in
Corollary 1 with the following additional assumptions
1) there existsα > 0 and a sequencerN , such that, for allN ,
max1≤k≤K
max(λTk
rN +1, λRk
rN +1) ≤ α (19)
whereλX1 ≥ . . . ≥ λX
N denote the ordered eigenvalues of theN × N matrix X.
2) denotingbN an upper-bound on the spectral norm of theTk ’s andRk ’s, k ∈ {1, . . . , K}, andβ some real
such thatβ > K(b/a)(1 +√
a)2, thenaN = b2Nβ satisfies
rN log(1 + aN/x) = o(N) (20)
Then, for largeN , nk, the Shannon transformV(x) =∫
log(1 + λ)dFBN (λ) of BN , satisfies
V(x) − V(0)(x)
a.s.−→ 0 (21)
where
V(0)(x) =
1
Nlog det
(IN +
1
x
K∑
k=1
Rk
∫τk
1 + ckek(−x)τkdFTk(τk)
)
+
K∑
k=1
1
ck
∫log (1 + ckek(−x)τk) dFTk(τk)
+ x · m(0)N (−x) − 1 (22)
Proof: The proof of Theorem 2 is provided in Appendix B.
Remark 1:Remind that, since Gaussian matrices are right-unitary invariant with entries of finite fourth moment,
the special case ofXk matrices with Gaussian i.i.d. entries fits the requirementsof Corollary 1 and Theorem 2. In
the practical applications of the following section, we shall always consider this hypothesis.
Remark 2:Note that this last result is consistent with the work of Telatar [7] and Tulino [2] whenK = 1 and
the transmission channels are respectively Gaussian with no correlation or with separable variance profile. The
latter is easy to observe from the form of Equation (206) which, up to a Stieltjes-η variable change, is exactly the
expression in [2] whenK = 1 with separable variance profile.
11
H1
H2 H3
HK
User1
User2
User3
UserK
Base station
Fig. 1. Downlink scenario in multi-user broadcast channel
IV. A PPLICATIONS
In this section, we provide applications of Theorems 1 and 2 to two studies in the field of wireless communications.
First, in Section IV-A, we derive an expression of the rate region of multi-antenna multiple access and broadcast
channels for given correlation matrices at the transmit andreceive devices. An iterative power allocation algorithm
is then introduced which is proved, upon convergence, to maximize the deterministic equivalent of the rate region
corner points. Then, in Section IV-B, we provide an analytical expression of the capacity of the single user and
MMSE decoders in wireless MIMO networks with inter-cell interference. For the single user decoder, a derivation
of the optimal power allocation for the sum rate maximization is also provided.
A. Rate Region of Broadcast Channels
1) System Model:Consider a wireless multi-user channel withK ≥ 1 users indexed from1 to K, controlled
by a single base station. Userk is equipped withnk antennas while the base station is equipped withN antennas.
We additionally denoteck = N/nk. This situation is depicted in Figure 1.
Even if we will longly discuss the MAC channel between theK users and the base station, our prior objective
is to characterize the BC channel between the base station and the users. For this, we denotes ∈ CN , E[ssH] = P,
the signal transmitted by the base station, with power constraint tr(P) ≤ P , P > 0; yk ∈ Cnk the signal received
by userk andnk ∼ CN(0, σ2Ink) the noise vector received by userk.7 The fading MIMO channel between the
base station and userk is denotedHk ∈ CN×nk . Moreover we assume thatHk has a separable variance profile,
7up to a scaling of the power constraints of the individual users, setting the same noise varianceσ2 on each receive antenna for every user
does not restrict the generality and simplifies the theoretical expressions.
12
i.e. can be decomposed as
Hk = R12
k XkT12
k (23)
with Rk ∈ Cnk×nk the (Hermitian) correlation matrix at receiverk with respect to the channelHk, Tk ∈ C
N×N
the correlation matrix at the base station for linkHk andXk ∈ Cnk×N a random matrix with Gaussian independent
entries of variance1/nk. In the Kronecker model,Tk andRk satisfy tr(Tk) = N and tr(Rk) = nk.
With the assumptions above, the downlink communication model unfolds
yk = Hks + n (24)
Denoting equivalentlysk the signal transmitted in the dual uplink by userk, such thatE[sksH
k ] = Pk, tr(Pk) ≤Pk, y andn the signal and the noise received by the base station, we havethe converse uplink model
y =
K∑
k=1
HH
k sk + nk (25)
In the following, we will derive the BC rate region by means ofthe MAC-BC duality [20]. We then consider
first the achievable MAC rate region.
2) MAC Rate Region:The (per-receive antenna normalized) rate regionCMAC(P1, . . . , PK ;HH) of the MAC
channelHH under respective transmit power constraintsP1, . . . , PK for users1 to K respectively and compound
channelHH = [HH1 . . .HH
K ], is given in [22], and reads
CMAC(P1, . . . , PK ;HH) =⋃
tr(Pi)≤Pi
Pi≥0i=1,...,K
{{Ri, 1 ≤ i ≤ K} :
∑
i∈S
Ri ≤1
Nlog
∣∣∣∣∣I +1
σ2
∑
i∈S
HH
i PiHi
∣∣∣∣∣ , ∀S ⊂ {1, . . . , K}}
(26)
For any setS ⊂ {1, . . . , K}, thanks to Theorem 2, we have approximately, forN , nk large,
1
Nlog
∣∣∣∣∣IN +1
σ2
∑
i∈S
HH
i PiHi
∣∣∣∣∣ =1
Nlog det
(IN +
1
σ2
∑
k∈S
Tk
∫rk
1 + ckek(−σ2)rkdFRkPk(rk)
)
+∑
k∈S
1
ck
∫log(1 + ckek(−σ2)rk
)dFRkPk(rk)
+ σ2 · m(0)S
(−σ2) − 1 (27)
where
mS(−σ2) =1
Ntr
(∑
k∈S
∫rkdFRkPk(rk)
1 + ckrkek(−σ2)Tk + σ2IN
)−1
(28)
and theei’s satisfy
ei(−σ2) =1
NtrTi
(∑
k∈S
∫rkdFRkPk(rk)
1 + ckrkek(−σ2)Tk + σ2IN
)−1
(29)
From these equations, the complete MAC capacity region can be described. To go further, we first need the
following result,
13
Proposition 1: If any of the correlation matricesRk, k ∈ S is invertible, then the right-hand side of the capacity
approximate (27) is a strictly concave function ofP1, . . . ,P|S|.
Proof: The proof of Proposition 1 is provided in Appendix C.
From Proposition 1, we immediately prove that the|S|-ary set of capacity-achieving matrices(P⋆1, . . . ,P
⋆|S|)
is unique, provided that any of theRk ’s is invertible. In a very similar way as in [27], we will showthat the
matricesP⋆k, k ∈ {1, . . . , |S|}, have the following properties: (i) their eigenspaces are the same as those of their
correspondingRk’s, (ii) their eigenvalues are the solutions of a classical water-filling problem.
Proposition 2: For everyk ∈ S, denoteRk = UkDkUH
k the spectral decomposition ofRk with Uk unitary and
Dk = diag(rk1, . . . , rknk) diagonal. Then the capacity-achieving signal covariance matricesP⋆
1, . . . ,P⋆|S| satisfy
1) P⋆k = UkQ
⋆kU
H
k , with Q⋆k diagonal; i.e. the eigenspace ofP⋆
k is the same as the eigenspace ofRk.
2) denotingδ⋆k = δk(−σ2,P⋆
k) ande⋆k = ek(−σ2,P⋆
k), the ith diagonal entryq⋆ki of Q⋆
k satisfies
q⋆ki =
(µk − 1
cke⋆krki
)+
(30)
where theµk ’s are evaluated such thattr(Qk) = Pk.
Proof: The proof of Proposition 2 recalls the proof from Najim, Prop. 4 in [27]. We essentially need to show
that, at point(δ⋆1 , . . . , δ⋆
|S|, e⋆1, . . . , e
⋆|S|), the derivative of (27) along anyQk is the same whether theδ⋆
k ’s and the
e⋆k ’s are fixed or vary withQk. In other words, using the form (206) for the capacity, let usdefine the functions
V(0)(P1, . . . ,P|S|) =
∑
k∈S
1
Nlog det (Ink
+ ckekRkPk)
+1
Nlog det
(IN +
∑
k∈S
δkTk
)
− σ2K∑
k=1
δk(−σ2)ek(−σ2) (31)
where
ei = ei(P1, . . . ,P|S|) =1
NtrTi
(σ2
[IN +
∑
k∈S
δkTk
])−1
(32)
δi = δi(P1, . . . ,P|S|) =1
nitrRiPi
(σ2 [Ini
+ ciei(z)RiPi])−1
(33)
andV : (P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) 7→ V(0)(P1, . . . ,P|S|). Then we need only prove that, for allk ∈ S,
∂V
∂δk(P1, . . . ,P|S|, δ
⋆1 , . . . , δ⋆
|S|, e⋆1, . . . , e
⋆|S|) = 0 (34)
∂V
∂ek(P1, . . . ,P|S|, δ
⋆1 , . . . , δ⋆
|S|, e⋆1, . . . , e
⋆|S|) = 0 (35)
Remark then that
∂V
∂δk(P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) =
1
Ntr
(
I +∑
i∈S
δiTi
)−1
Tk
− σ2ek (36)
∂V
∂ek(P1, . . . ,P|S|, δ1, . . . , δ|S|, e1, . . . , e|S|) = ck
1
Ntr[(I + ckekRiPi)
−1RkPk
]− σ2δk (37)
14
At initialization, for all k ∈ S, Qk = Qknk
Ink, δk = 1, ek = 1.
while the Pk ’s have not convergeddo
for k ∈ S do
Set (δk, ek) as solution of (32), (33)
for i = 1 . . . , nk do
Set qk,i =“
µk − 1
ckekrki
”+
, with µk such thattrQk = Pk.
end for
end for
end while
TABLE I
ITERATIVE WATER-FILLING ALGORITHM FOR POWER OPTIMIZATION
both being null whenever, for allk, ek = ek(−σ2,P1, . . . ,P|S|) andδk = δk(−σ2,P1, . . . ,P|S|), which is true in
particular for the unique power optimal solutionP⋆1, . . . ,P
⋆|S| wheneverek = e⋆
k andδk = δ⋆k.
When, for all k, ek = e⋆k, δk = δ⋆
k, the maximum ofV over thePk ’s is then obtained by maximizing the
expressionslog det(Ink+ cke⋆
kRkPk) overPk. By Hadamard’s inequality,
det(Ink+ cke⋆
kRkPk) ≤nk∏
i=1
‖(Ink+ cke⋆
kRkPk)i‖2 (38)
where, only here, we denote(X)i theith column of matrixX. The equality is obtained if and only ifInk+cke⋆
kRkPk
is diagonal. The equality case arises forPk andRk = UkDkUH
k co-diagonalizable. In this case, denotingPk =
UkQkUH
k , the entries ofQk, constrained bytr(Qk) = Pk are solutions of the classical optimization problem under
constraint,
supQk
tr(Qk)≤Pk
log det (Ink+ cke⋆
kQkDk) (39)
whose solution is given by the classical water-filling algorithm. Hence (30).
We then propose an iterative water-filling algorithm to obtain the power allocation policy which maximizes the
right-hand side of (27). This is provided in Table I.
In [27], it is proven that the convergence of this algorithm ensures its convergence towards the only capacity-
achieving power allocation policy. However, as recalled in[27], it is difficult to prove the convergence of the
algorithm in Table I in general. Nonetheless, extensive simulations initialized withδk = 1, ek = 1, for all k,
showed the algorithm in Table I always converged.
Remark 3: It is important to stress out the fact that we did not prove that the classical water-filling solution
maximizes the true capacity forN finite. We merely showed that, for largeN , the true capacity expression on the
left-hand side of (27) is closely approximated by an expression (the right-hand side of (27)) whose maximum is
achieved by the iterative water-filling algorithm of Table I. The optimality of the water-filling solution for finiteN
is proven, e.g. in [24], for the special case when allRk are equal.
15
H1
H2
HK
User1
Base station1
Base station2
Base stationK
Fig. 2. Downlink multi-cell scenario
3) BC Rate Region:The capacity region of the broadcast multi-antenna channelhas been recently shown [23]
to be achieved by the dirty paper coding (DPC) algorithm. This regionCBC(P ;H), for a transmit power constraint
P over the compound channelH, is shown by duality arguments to be the set [20]
CBC(P ;H) =⋃
PKk=1 Pk≤P
CMAC(P1, . . . , PK ;HH) (40)
which is easily obtained from Equation (26).
B. Multi-User MIMO
1) Signal Model: In this section we study the per-antenna rate performance ofwireless networks including a
multi-antenna transmitter and a multi-antenna receiver, the latter of which is interfered by several multi-antenna
transmitters. This scheme is well-suited to multi-cell wireless networks with orthogonal intra-cell and interfering
inter-cell transmissions, both in downlink and in uplink. In particular, this encompasses
• multi-cell uplink: consider aK-cell network; the base station of a cell indexed byi ∈ {1, . . . , K} receives
data from a terminal user in this cell8 and is interfered byK − 1 users transmitting on the same physical
resource from remote cells indexed byj ∈ {1, . . . , K}, j 6= i.
• multi-cell downlink: the user being allocated a given time/frequency resource in a cell indexed byi ∈{1, . . . , K} receives data from its dedicated base-station and is interfered byK−1 base stations in neighboring
cells indexed byj ∈ {1, . . . , K}, j 6= i. This situation is depicted in Figure 2.
8this user is allocated a given time/frequency resource, which is orthogonal to time/frequency resources of the other users in the cell; e.g. the
multi-access protocol is OFDMA.
16
In the following, in order not to confuse both scenarios, only the downlink scheme is considered. However, one
must keep in mind that the provided results can easily be adapted to the uplink case.
Consider a wireless mobile network withK ≥ 1 cells indexed from1 to K, controlled bynon-physically
connectedbase stations. We assume that, on a particular time or frequency resource, each base station serves only
one user; therefore the base station and the user of cellj will also be indexed byj. Without loss of generality, we
focus our attention on user1, equipped withN ≫ K antennas and hereafter referred to asthe useror the receiver.
Every base stationj ∈ {1, . . . , K} is equipped withnj ≫ K antennas. Similarly to previous sections, we denote
cj = N/nj.
Denotesj ∈ Cnj , E[sjsH
j ] = Inj, the signal transmitted by base stationj, y ∈ CN and n ∼ CN(0, σ2IN )
the signal and noise vectors received by the user. The fadingMIMO channel between base stationj and the user
is denotedHj ∈ CN×nj . Moreover, we assume thatHj is Gaussian with a separable variance profile, given by
Equation (23).
With the assumptions above, the communication model unfolds
y = H1s1 +
K∑
j=2
Hjsj + n (41)
wheres1 is the useful signal (from base station1) andsj , j ≥ 2, constitute interfering signals.
2) Single User Decoding:
a) Uniform Power Allocation:If the receiving user considers the signals from theK−1 interfering transmitters
as Gaussian noise with a known variance pattern9, then base station1 can transmit with arbitrarily low decoding
error at a per-receive antenna rateCSU(σ2) given by
CSU(σ2) =1
Nlog |IN +
1
σ2
K∑
j=1
HjHH
j | −1
Nlog |IN +
1
σ2
K∑
j=2
HjHH
j | (42)
Assume thatN and theni, i ∈ {1, . . . , K}, are large compared toK and such that the maximum eigenvalues
of Ri or Ti are bounded away fromN . From Corollary 1, we define the functionsmi,(0) as the approximated
Stieltjes transforms of∑K
j=i HjHH
j , i ∈ {1, 2},
mi,(0)(z) =1
Ntr
(K∑
k=1
∫τkdFTk(τk)
1 + ckτkeik(z)
Rk − zIN
)−1
(43)
where, for allj ∈ {1, . . . , K}, eij(z) is solution of the fixed-point equation
eij(z) =
1
NtrRj
(K∑
k=i
∫τkdFTk(τk)
1 + ckτkek(z)Rk − zIN
)−1
(44)
9in practice, the noise variance does not require the knowledge of the other base station to user channels, as it can be inferred from data
sensing in idle mode.
17
From Theorem 2, we then have approximately
CSU(σ2) =1
Nlog det
(IN +
1
σ2
K∑
k=1
Rk
∫τk
1 + cke1k(−σ2)τk
dFTk(τk)
)
− 1
Nlog det
(IN +
1
σ2
K∑
k=2
Rk
∫τk
1 + cke2k(−σ2)τk
dFTk(τk)
)
+K∑
k=1
1
ck
∫log(1 + cke1
k(−σ2)τk
)dFTk(τk)
−K∑
k=2
1
ck
∫log(1 + cke2
k(−σ2)τk
)dFTk(τk)
+ σ2 · [m1,(0)(−σ2) − m2,(0)(−σ2)] (45)
b) Power Optimization for Single-User Decoding:In this section we wish to perform power allocation so to
maximize the single-user decoding capacityCSU(σ2) along P1, the signal variance at the main transmitter. We
therefore replace the matricesTj by T12
j PjT12
j in Equation (45).
Let us rewriteCSU(σ2) like
CSU(σ2) =1
Nlog |IN +
1
σ2A− 1
2 R121 X1T
121 P1T
121 XH
1 R121 A− 1
2 | (46)
with A = IN + 1σ2
∑j>1 HjPjH
H
j .
According to [25], the optimal power allocation strategy insuch a model isP1 =∑n1
i=1 pivH
i vi with V =
[v1, . . . ,vn] the eigenvector matrix in the spectral decomposition ofT1 = VΛVH, Λ diagonal, where thepi’s
satisfy
pi = 0 (1/αi) − 1 ≤ 1
n1
∑n1
l=1(1 − αl)
pi = 1−αi1
n1
Pn1l=1(1−αl)
otherwise(47)
for αi defined as
αi =
(1 +
1
σ2hH
i A− 12
[IN +
1
σ2A− 1
2 H−iP1HH
−iA− 1
2
]−1
A− 12 hi
)−1
(48)
hi is the ith column ofH1, andH−i is H1 with column i removed.
Denotinghi = R121 xi, we havexi centered Gaussian with covarianceT1ii
/n1IN and independent of
R121 A− 1
2
[IN + 1
σ2 A− 1
2 H−iP1HH
−iA− 1
2
]−1
A− 12 R
121 . Therefore, asymptotically onN , from Lemma 7,
αi =
(1 +
T1ii
σ2NtrR1
[A +
1
σ2H−iP1H
H
−i
]−1)−1
(49)
=
(1 +
T1ii
σ2NtrR1
[A +
1
σ2H1P1H
H
1
]−1)−1
(50)
=
1 +
T1ii
σ2NtrR1
IN +
1
σ2
K∑
j=1
HjPjHH
j
−1
−1
(51)
=(1 + T1ii
e11(−σ2)
)−1(52)
18
This leads to the power allocation
pi = 0 T1iie11(−σ2) ≤ 1
n1
∑n1
l=1 1 −(1 + T1ll
e11(−σ2)
)−1
pi =1−(1+T1ii
e11(−σ2))
−1
1n1
Pn1l=1 1−(1+T1ll
e11(−σ2))−1 otherwise
(53)
In particular, if the diagonal entries ofT1 are all equal (as is often the case in practice), then the optimal power
allocation policy is the trivial equal power allocation.
3) MMSE Decoding:Achieving CSU requires non-linear processing at the receiver, such as successive MMSE
interference cancellation techniques. A suboptimal linear technique, the MMSE decoder, is often used instead.
The communication model in this case reads
y =
k∑
j=1
HjHH
j + σ2IN
−1
HH
1
k∑
j=1
Hjsj + n
(54)
where(∑k
j=1 HjHH
j + σ2IN
)−1
HH
1 is the MMSE linear filter at the receiver. Each entry ofy will be processed
individually.
This technique makes it possible to transmit data reliably at any rate inferior to the per-antenna MMSE capacity
CMMSE,
CMMSE(σ2) =1
N
n1∑
i=1
log(1 + γi) (55)
where, denotinghj ∈ Cnj the jth column ofH1 andR121 xj = hj , the SINRγi expresses as
γi =hH
i
(∑Kj=1 HjH
H
j + σ2IN
)−1
hi
1 − hH
i
(∑Kj=1 HjH
H
j + σ2IN
)−1
hi
(56)
= hH
i
K∑
j=1
HjHH
j − hihH
i + σ2IN
−1
hi (57)
= xH
i R12
i
K∑
j=1
HjHH
j − hihH
i + σ2IN
−1
R12
i xi (58)
where Equation (57) comes from a direct application of the matrix inversion lemma. With these notations,xi
has i.i.d. complex Gaussian entries with varianceT1ii/ni and the inner matrix of the right-hand side of (58) is
independent ofxi (since the entries ofH1HH1 − hih
H
i are independent of the entrieshi). Applying Lemma 7, for
N large,
γi =T1ii
NtrR1
K∑
j=1
HjHH
j − hihH
i + σ2IN
−1
(59)
From Lemma 5, the rank 1 perturbation(−hihH
i ) does not affect asymptotically the trace in (59). Therefore,
approximately,
γi =T1ii
NtrR1
K∑
j=1
HjHH
j + σ2IN
−1
(60)
19
Noting thate11(z) in Equation (44) corresponds to the normalized trace in Equation (60), we finally have the
compact expression forCMMSE,
CMMSE(σ2) =1
N
n1∑
i=1
log(1 + T1ii
e11(−σ2)
)(61)
In practice, when no specific power allocation strategy is applied and when no exotic transmit correlation is
present,T1ii= P/nk the average power per transmit symbol, and the capacity becomesCMMSE = 1
c1· log(1 +
P/nk · e1(−σ2)).
V. SIMULATIONS AND RESULTS
In the following, we apply the results obtained in Sections IV-A and IV-B for the broadcast channel rate region
and the multi-user MIMO cases respectively.
A. BC Rate Region
First, we provide simulation results in the context of a two-user broadcast channel, withN = 8 transmit antennas
and n1 = n2 = 4 receive antennas, all placed in linear arrays. The distancedT between subsequent transmit
antennas is such thatdT/λ = 10, λ the signal median wavelength; between adjacent receive antennas, the distance
dR is the same for users1 and 2 and such thatdR/λ = 1/4. The variance profile is generated thanks to Jakes’
model, with privileged directions of signal departure and arrival. For instance, the entry(a, b) of matrix T1 is
T1ab=
∫ θ(T1)max,
θ(T1)
min
exp
(2πi|a − b|d
T
λcos(θ)
)dθ (62)
where, in our case, with obvious notations,θ(R1)min = 0, θ
(R1)max = π/2, θ
(R2)min = π/3, θ
(R2)max = 5π/6, θ
(T1)min = 2π/3,
θ(T1)max, = −5π/6, θ
(T2)min = π, θ
(T2)max = −π/2.
In Figure 3, we take a20 dB value for the SNR and compare the rate region obtained from the convex optimization
scheme of Section IV to the equal power allocation strategy.Notice that, in spite of the strong correlation exhibiting
very low eigenvalues inR1 and R2, very little is gained by our power allocation scheme compared to uniform
power allocation. Quite to the contrary, in Figure 4, we consider a−5 dB SNR, and observe a substantial gain in
capacity from the optimal iterative water-filling algorithm of Table I compared to equal power allocation on the
transmit antenna array.
In Figure 5, a comparison is made between the theoretical andsimulated rate regions for a20 dB SNR. The
latter is obtained from1, 000 averaged Monte Carlo simulations for every transmit power pair (P1, P2). We observe
an almost perfect fit, even for these lowN = 8, n1 = n2 = 4 numbers of transmit and receive antennas.
B. Multi-User MIMO
We now apply Equations (45) and (61) to the downlink of a two-cell network. The capacity analyzed here is
the per-antenna achievable rate on the link between base station 1 and the user, the latter of which is interfered by
base station2. The relative power of the interfering signal from base station 2 is on averageΓ times that of user
20
0 0.5 1 1.5 2 2.5 30
1
2
3
Rate of User1 [bits/s/Hz]
Rat
eof
Use
r2[b
its/s
/Hz]
Fig. 3. (Per-antenna) rate regionCBC for K = 2 users,N = 8, n1 = n2 = 4, SNR = 20 dB, random transmit-receive solid angle of
apertureπ/2, dT/λ = 10, dR/λ = 1/4. In thick line, capacity limit whenE[ssH] = IN .
1. Both base stations1 and2 are equipped with linear arrays ofn antennas and the user with a linear array ofN
antennas. The correlation matricesTi at the transmission andRi at the reception,i ∈ {1, 2}, are also modeled
thanks to the generalized Jake’s model.
In Figure 6, we tookN = 16, Γ = 0.25 and we consider single-user decoding at the receiver. For every realization
of Ti, Ri, 1000 channel realizations are processed to produce the simulated ergodic capacity and compared to the
theoretical capacity (61). Those capacities are then averaged over100 realizations ofTi, Ri, varying in the random
choice ofθ(Ri)min , θ
(Ti)min , θ
(Ri)max and θ
(Ti)max with constraintθ(Ri)
max − θ(Ri)min = θ
(Ti)max − θ
(Ti)min = π/2, while the distance
between antennas indexed bya andb aredTi
ab = 10λ|a− b| at the transmitters anddRab = 2λ|a− b| at the receiver.
The SNR ranges from−5 dB to 30 dB, andn ∈ {8, 16}. We observe here that Monte-Carlo simulations perfectly
match the capacity obtained from Equation (45).
In Figure 7, with the same assumptions as previously, we apply MMSE decoding at the base station. Here, a
slight difference is observed in the high SNR regime betweentheory and practice. This was somehow expected,
since the largeN approximations in Lemmas 7 and 5 especially are very loose for σ2 close toR−. To cope with
this gap, many more antennas must be used. We also observe a significant difference in performance between the
optimal single-user and the suboptimal linear MMSE decoders, especially in the high SNR region. Therefore, in
wireless networks, when interfering cells are treated as Gaussian correlated noise at the cell-edge, i.e. where the
21
0 0.1 0.2 0.3 0.4 0.5 0.60
0.2
0.4
0.6
Rate of User1 [bits/s/Hz]
Rat
eof
Use
r2[b
its/s
/Hz]
Fig. 4. (Per-antenna) rate regionCBC for K = 2 users,N = 8, n1 = n2 = 4, SNR = −5 dB, random transmit-receive solid angle of
apertureπ/2, dT/λ = 10, dR/λ = 1/4. In thick line, capacity limit whenE[ssH] = IN .
interference is maximum, the MMSE decoder provides tremendous performance loss.
Finally, in Figure 8, we study the effect of the angle spread of energy transmission and reception. As previously
recalled, in most contributions treating antenna correlation, only the distances between the antennas is considered as
relevant to the signal correlation. We provide hereafter a classical situation for which angles of departure and arrival
are of critical importance to the system rate performance. The situation is similar to that of Figure 6 withn = 8
(N = 16), SNR = 20 dB; the median directional of arrival and departure (for all devices) are taken fromπ/32
(grazing angle) toπ/2 (signal transmitted/received in front), while the angle spread, on thex-axis, varies fromπ/32
to π/2. First, note importantly that the directions of departure or arrival are significant to the system performance:
for instance, with a quite realistic angle spread ofπ/4, the single-user decoding capacity shows a three-fold increase
from grazing to orthogonal angles.10 We observe that, in all cases, the per-antenna capacity grows with the angle
spread, which therefore offers some sort of diversity gain.Also, grazing angles provide significantly less diversity
gain than square angles of departure or arrival.
10remember that we assume isotropic energy emission/reception in the top-bottom direction and we consider here a restrictive angle of captured
energy in the horizontal plane so that, in reality, one mightexpect even more staggering results.
22
0 0.5 1 1.5 2 2.5 30
1
2
3
Rate of User1 [bits/s/Hz]
Rat
eof
Use
r2[b
its/s
/Hz]
Theory
Simulation
Fig. 5. (Per-antenna) rate regionCBC for K = 2 users, theory against simulation,N = 8, n1 = n2 = 4, SNR = 20 dB, random
transmit-receive solid angle of apertureπ/2, dT/λ = 10, dR/λ = 1/4.
VI. CONCLUSION
In this contribution, we proposed to analyze the per-antenna rate performance of a wide family of multi-antenna
communication schemes including multiple cells and/or multiple users, and taking into account the correlation effects
due to antenna closeness and directional energy transmission/reception. As an introductory example, we studied
the rate region of MAC and BC channels, as well as the uplink and downlink capacity of multi-cell networks with
interference. Our main results stem from a novel mathematical deterministic equivalent of the Stieltjes transform
and the Shannon transform of a certain type of large random matrices. Based on these new tools, an accurate
analysis of the effects of correlation can be directly translated into the antenna efficiency of multi-user multi-cell
systems. We also provided an iterative water-filling algorithm to achieve the capacity boundary of the MAC and
BC rate regions, which proved in simulation to give tremendous capacity gains in the low SNR regime.
23
−5 0 5 10 15 20 25 300
1
2
3
4
SNR [dB]
Per
-rec
eive
ante
nna
capa
cityCSU
[bits
/s/H
z] Simulation,n = 8
Theory,n = 8
Simulation,n = 16
Theory,n = 16
Fig. 6. Capacity of point-to-point MIMO in two-cell uplink,single-user decoding,N = 16, n ∈ {8, 16}, Γ = 25%.
APPENDIX A
PROOF OFTHEOREM 1
Proof: For ease of read, the proof will be divided into several sections.
We first consider the caseK = 1, whose generalization toK ≥ 1 is given in Appendix A-E. Therefore, in the
coming sections, we drop the useless indexes.
A. Truncation and centralization
We begin with the truncation and centralization steps whichwill replaceX, R andT by matrices with bounded
entries, more suitable for analysis; the difference of the Stieltjes transforms of the original and newBN converging to
zero. Since vague convergence of distribution functions isequivalent to the convergence of their Stieltjes transforms,
it is sufficient to show the original and new empirical distribution functions of the eigenvalues approach each other
almost surely in the space of subprobability measures onR with respect to the topology which yields vague
convergence.
Let Xij = Xij1{|Xij |<√
N}−E(Xij1{|Xij |<√
N}) andX =(
1√nXij
). Then, from c), Lemma 1 and a), Lemma
3, it follows exactly as in the initial truncation and centralization steps in [4] and [19] (which provide more details
in their appendices), that
|FBN − FS+R12 eXTeXHR
12 | a.s.−→ 0 (63)
24
−5 0 5 10 15 20 25 300
1
2
3
4
SNR [dB]
Per
-rec
eive
ante
nna
capa
cityCM
MSE
[bits
/s/H
z]Simulation,n = 8
Theory,n = 8
Simulation,n = 16
Theory,n = 16
Fig. 7. Capacity of point-to-point MIMO in two-cell uplink,with MMSE decoder,N = 16, n ∈ {8, 16}, Γ = 25%.
asN → ∞.
Let now X ij = Xij · 1{|Xij |<ln N} −E(Xij1I{|Xij |<ln N}) andX =(
1√nX ij
). This is the final truncation and
centralization step, which will be practically handled thesame way as in [4], which some minor modifications,
given presently.
For any Hermitian non-negative definiter × r matrix A, let λAi denote itsi-th smallest eigenvalue ofA. With
A = Udiag(λA1 , . . . , λA
r )UH its spectral decomposition, let for anyα > 0
Aα = Udiag(λA1 1{λA
r ≤α}, . . . , λAr 1{λ1≤α})U
H (64)
Then for anyN × N matrix Q, we get from 1) and 2), Lemma 3,
‖FS+R12 QTQHR
12 − FS+R
12
αQTαQHR12
α
‖ ≤ 2
Nrank(R
12 − R
12 α) +
1
Nrank(T − Tα) (65)
=2
N
N∑
i=1
1{λR
i>α} +
1
N
n∑
i=1
1{λT
i>α)} (66)
= 2FR((α,∞)) +1
cNFT((α,∞)) (67)
Therefore, from the assumptions 4) and 6) in Theorem 1, we have for any sequence{αN} with αN → ∞
‖FS+R12 QTQHR
12 − FS+R
12
αN QTαN QHR12
αN ‖ → 0 (68)
25
π32
π4
π2
3π4
π0
1
2
3
Departure/arrival angle spread [rad]
Per
-rec
eive
ante
nna
capa
cityCSU
[bits
/s/H
z]
Median angleπ/2
Median angleπ/4
Median angleπ/8
Median angleπ/16
Median angleπ/32
Fig. 8. Capacity of point-to-point MIMO in two-cell uplink,single-user decoding,N = 16, n = 8, Γ = 25%, SNR = 20 dB, for different
median angles of departure/arrival.
asN → ∞.
A metricD on probability measures defined onR, which induces the topology of vague convergence, is introduced
in [4] to handle the last truncation step. The matrices studied in [4] are essentiallyBN with R = IN . Following
the steps beginning at (3.4) in [4], we see in our case that when αN is chosen so that asN → ∞, αN ↑ ∞,
α8N (E|X2
111{X11|≥lnN} + N−1) → 0 (69)
and∞∑
N=1
α16N
N2
(E|X11|41{|X11|<
√N)} + 1
)< ∞ (70)
We will get
D(FS+R12
αN eXTαN eXHR12
αN
, FS+R12
αN XTαN XHR
12
αN
)a.s.−→ 0 (71)
asN → ∞.
Since E|X11|2 → 1 as N → ∞ we can rescale and replaceX with X/√
E|X11|2, whose components are
bounded byk lnN for somek > 2. Let log N denote logarithm ofN with basee1/k (so thatk lnN = log N ).
Therefore, from (68) and (71) we can assume that for eachN the Xij are i.i.d.,EX11 = 0, E|X11|2 = 1, and
|Xij | ≤ log N .
26
Later on the proof will require a restricted growth rate on both ‖R‖ and ‖T‖. We see from (68) that we can
also assume
max(‖R‖, ‖T‖) ≤ log N (72)
B. Deterministic approximation ofmN (z)
Write X = [x1, . . . ,xn], xi ∈ CN and letyj = (1/√
n)R12 xj . Then we can write
BN = S +
n∑
j=1
τjyjyH
j (73)
We assumez ∈ C+ and letv = ℑ[z]. Define
eN = eN (z) = (1/N) trR(BN − zIN)−1 (74)
and
pN = − 1
nz
n∑
j=1
τj
1 + cNτjeN=
∫ −τ
z(1 + cNτeN )dFT(τ) (75)
Write BN = OΛOH, Λ = diag(λ1, . . . , λN ), its spectral decomposition. LetR = {Rij} = OHRO. Then
eN = (1/N) trR(Λ − zIN )−1 = (1/N)
N∑
i=1
Rii
λi − z(76)
We therefore see thateN is the Stieltjes transform of a measure on the nonnegative reals with total mass(1/N) trR.
It follows that botheN(z) and zeN(z) mapC+ into C+. This implies thatpN (z) and zpN(z) mapC+ into C+
and, asz → ∞, zpN(z) → −(1/n) trT. Therefore, from Lemma 6, we also havepN the Stieltjes transform of a
measure on the nonnegative reals with total mass(1/n) trT. From (72), it follows that
|eN | ≤ v−1 log N (77)
and ∣∣∣∣∫
τ
(1 + cNτeN )dFT(τ)
∣∣∣∣ = |zpN(z)| ≤ |z|v−1 log N (78)
More generally, from Lemma 6, any function of the form
−τ
z(1 + m(z))(79)
whereτ ≥ 0 andm(z) is the Stieltjes transform of a finite measure onR+, is the Stieltjes transform of a measure
on the nonnegative reals with total massτ . It follows that∣∣∣∣
τ
1 + m(z)
∣∣∣∣ ≤ τ |z|v−1 (80)
Fix now z ∈ C+. Let B(j) = BN − τjyjyH
j . DefineD = −zIN + S− zpN(z)R. We write
BN − zIN − D =n∑
j=1
τjyjyH
j + zpNR (81)
27
Taking inverses and using Lemma 4 we have
(BN − zIN)−1 − D−1 =
n∑
j=1
τjD−1yjy
H
j (BN − zIN )−1 + zpND−1R(BN − zIN )−1 (82)
=
n∑
j=1
τj
D−1yjyH
j (B(j) − zIN)−1
1 + τjyH
j (B(j) − zIN)−1yj+ zpND−1R(BN − zIN)−1 (83)
Taking traces and dividing byN , we have
1
NtrD−1 − mN (z) =
1
n
n∑
j=1
τjdj ≡ wmN (84)
where
dj =(1/N)xH
j R12 (B(j) − zIN)−1D−1R
12 xj
1 + τjyH
j (B(j) − zIN )−1yj− (1/N) trR(BN − zIN )−1D−1
1 + cNτjeN(85)
Multiplying both sides of the above matrix identity byR, and then taking traces and dividing byN , we find
1
NtrD−1R − eN (z) =
1
n
n∑
j=1
τjdej ≡ we
N (86)
where
dej =
(1/N)xH
j R12 (B(j) − zIN)−1RD−1R
12 xj
1 + τjyH
j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN)−1RD−1
1 + cNτjeN(87)
We then show that, for anyk > 0, almost surely
limN→∞
(logk N)wmN = 0 (88)
and
limn→∞
(logk N)weN = 0 (89)
Notice that for eachj, yH
j (B(j) − zIN )−1yj can be viewed as the Stieltjes transform of a measure onR+.
Therefore from (80) we have ∣∣∣∣∣1
1 + τjyH
j (B(j) − zIN)−1yj
∣∣∣∣∣ ≤|z|v
. (90)
For eachj, let e(j) = e(j)(z) = (1/N) trR(B(j) − zIN)−1, and
p(j) = p(j)(z) =
∫ −τ
z(1 + cNτe(j))dFT(τ) (91)
both being Stieltjes transforms of measures onR+, along with the integrand for eachτ .
Using Lemma 4, Equations (72) and (80), we have
|zpN − zp(j)| = |eN − e(j)|cN
∣∣∣∣∫
τ2
(1 + cNτeN )(1 + cNτe(j))dFT(τ)
∣∣∣∣ ≤cN |z|2 log3 N
Nv3. (92)
Let D(j) = −zIN + S − zp(j)(z)R. Notice that(BN − zIN )−1 and (B(j) − zIN)−1 are bounded in spectral
norm byv−1 and, from Lemma 8, the same holds true forD−1 andD−1(j) .
In order to handle bothwmN , dj andwe
N , dej at the same time, we shall denote byE eitherT or IN , andwN ,
dj for now will denote either the originalwmN , dj or we
N , dej .
28
Write dj = d1j + d2
j + d3j + d4
j , where
d1j =
(1/N)xH
j R12 (B(j) − zIN)−1ED−1R
12 xj
1 + τjyH
j (B(j) − zIN)−1yj−
(1/N)xH
j R12 (B(j) − zIN)−1ED−1
(j)R12 xj
1 + τjyH
j (B(j) − zIN)−1yj(93)
d2j =
(1/N)xH
j R12 (B(j) − zIN)−1ED−1
(j)R12 xj
1 + τjyH
j (B(j) − zIN)−1yj−
(1/N) trR(B(j) − zIN)−1ED−1(j)
1 + τjyH
j (B(j) − zIN)−1yj(94)
d3j =
(1/N) trR(B(j) − zIN)−1ED−1(j)
1 + τjyH
j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN )−1ED−1
1 + τjyH
j (B(j) − zIN)−1yj(95)
d4j =
(1/N) trR(BN − zIN)−1ED−1
1 + τjyH
j (B(j) − zIN)−1yj− (1/N) trR(BN − zIN)−1ED−1
1 + cNτjeN(96)
From Lemma 4, Equations (72), (90) and (92), we have
τj |d1j | ≤
1
N‖xj‖2 cN log7 N |z|3
Nv7(97)
τj |d2j | ≤ |z|v−1 log N
N
∣∣∣xH
j R12 (B(j) − zIN )−1ED−1
(j)R12 xj − trR(B(j) − zIN)−1ED−1
(j)
∣∣∣ (98)
τj |d3j | ≤
|z| log3 N
vN
(1
v2+
cN |z|2 log3 N
v6
)→ 0, asn → ∞ (99)
τj |d4j | ≤
|z|cN log4 N
Nv3
(|xH
j R12 (B(j) − zIN )−1R
12 xj − trR
12 (B(j) − zIN)−1R
12 | + log N
v
)(100)
From Lemma 7, there existsK > 0 such that,
E| 1
N‖xj‖2 − 1|6 ≤ KN−3 log12 N (101)
E1
N6|xH
j R12 (B(j) − zIN)−1ED−1
(j)R12 xj − trR(B(j) − zIN)−1ED−1
(j) |6 ≤ KN−3v−12 log24 N (102)
E1
N6|xH
j R12 (B(j) − zIN)−1R
12 xj − trR
12 (B(j) − zIN)−1R
12 |6 ≤ KN−3v−6 log18 N (103)
All three moments when multiplied byn times any power oflog N , are summable. Applying standard argu-
ments using the Borel-Cantelli lemma and Boole’s inequality (on 4n events), we conclude that, for anyk > 0
logk N maxj≤n τjdja.s.−→ 0 asN → ∞. Hence Equations (88) and (89).
C. Existence and uniqueness ofm(0)N (z)
We show now that for anyN , n, S, R, N × N nonnegative definite andT = diag(τ1, . . . , τN ), τk ≥ 0 for all
1 ≤ k ≤ N , there exists a uniquee with positive imaginary part for which
e =1
Ntr
(S +
[∫τ
1 + cNτedFT(τ)
]R − zIN
)−1
R (104)
For existence we consider the subsequences{Nj}, {nj} with Nj = jN , nj = jn, so thatcNjremainscN , form
the block diagonal matrices
RNj= diag(R,R, . . . ,R), SNj
= diag(S,S, . . . ,S) (105)
both jN × jN and
TNj= diag(T,T, . . . ,T) (106)
29
of size jn × jn.
We see thatFTNj = FT and the right side of (104) remains unchanged for allNj . Consider a realization where
weNj
→ 0 as j → ∞. We have|eNj(z)| = |(jN)−1 trR(BjN − zIN )−1| ≤ v−1 log N , remaining bounded as
j → ∞. Consider then a subsequence for whicheNjconverges to, say,e. From (80), we see that
∣∣∣∣τ
1 + cNτeNj
∣∣∣∣ ≤ τ |z|v−1 (107)
so that from the dominated convergence theorem we have∫
τ
1 + cNτeNj(z)
dFT(τ) →∫
τ
1 + cNτedFT(τ) (108)
along this subsequence. Thereforee solves (104).
We now show uniqueness. Lete be a solution to (104) and lete2 = ℑ[e]. Recalling the definition ofD we write
e =1
Ntr
(D−1RD−H
(S +
[∫τ
1 + cNτe∗dFT(τ)
]R − z∗I
))(109)
We see that since bothR and S are Hermitian nonnegative definite,tr(D−1RD−HS
)is real and nonnegative.
Therefore we can write
e2 =1
Ntr
(D−1R(DH)−1
([∫cNτ2e2
|1 + cNτe|2 dFT(τ)
]R + vIN
))= e2α + vβ (110)
where we denoted
α =1
Ntr
(D−1R(DH)−1
[∫cNτ2
|1 + cNτe|2 dFT(τ)
]R
)(111)
β =1
Ntr(D−1R(DH)−1
)(112)
Let e be another solution to (104), withe2 = ℑ[e], and analogously we can writee2 = e2α + vβ. Let D denote
D with e replaced bye. Then we havee − e = γ(e − e) where
γ =
∫cNτ2
(1 + cNτe)(1 + cNτe)dFT(τ)
tr D−1RD−1R
N(113)
If R is the zero matrix, thenγ = 0, ande = e would follow. ForR 6= 0 we use Cauchy-Schwarz to find
|γ| ≤(∫
cNτ2
|1 + cNτe|2 dFT(τ)tr D−1R(DH)−1R
N
) 12
(∫cNτ2
|1 + cNτe|2 dFT(τ)tr D−1R(DH)−1R
N
) 12
(114)
= α12 α
12 (115)
=
(e2α
e2α + vβ
) 12(
e2α
e2α + vβ
) 12
(116)
Necessarilyβ andβ are positive sinceR 6= 0. Therefore|γ| < 1 so we must havee = e.
D. Termination of the proof
Let e0N denote the solution to (104). We show now for anyℓ > 0, almost surely
limN→∞
logℓ N(eN − e0N) = 0 (117)
30
Let e02 = ℑ[e0
N ], andα0 = α0N , β0 = β0
N be the values as above for whiche02 = e0
2α0 + vβ0. We have, using (72)
and (80),
e02α
0N/β0
N ≤ e02cN log N
∫τ2
|1 + cNτe0N |2 dFT(τ) (118)
= − log Nℑ[∫
τ
1 + cNτe0N
dFT(τ)
](119)
≤ log2 N |z|v−1 (120)
Therefore
α0 =
(e02α
0
e02α
0 + vβ0
)(121)
=
(e02α
0/β0
v + e02α
0/β0
)(122)
≤(
log2 N |z|v2 + log2 N |z|
)(123)
Let D0, D denoteD as above withe replaced by, respectivelye0N andeN . We have
eN =1
NtrD−1R − we
N (124)
With e2 = ℑ[eN ] we write as above
e2 =1
Ntr
(D−1RD−H
([∫cNτ2e2
|1 + cNτeN |2 dFT(τ)
]R + vIN
))−ℑ[we
N ] (125)
= e2α + vβ −ℑweN (126)
We have as aboveeN − e0N = γ(eN − e0
N ) + weN where now
|γ| ≤ α012 α
12 (127)
Fix an ℓ > 0 and consider a realization for whichlogℓ′ N weN → 0, whereℓ′ = max(ℓ + 1, 4) andn large enough
so that
|weN | ≤ v3
4cN |z|2 log3 N(128)
Supposeβ ≤ v2
4cN |z|2 log3 N. Then by Equations (72) and (80) we get
α ≤ cNv−2|z|2 log3 Nβ ≤ 1/4 (129)
which implies|γ| ≤ 1/2. Otherwise we get from (121) and (128)
|γ| ≤ α012
(e2α
e2α + vβ −ℑ[weN ]
) 12
(130)
≤(
log N |z|v2 + log N |z|
) 12
(131)
31
Therefore for allN large
logℓ N |eN − e0N | ≤ (logℓ N)we
N
1 −(
log2 N |z|v2+log2 N |z|
) 12
(132)
≤ 2v−2(v2 + log2 N |z|)(logℓ N)weN (133)
→ 0 (134)
asn → ∞. Therefore (117) follows.
Let m0N = N−1 trD0. We finally show
mN − m0N
a.s.−→ 0 (135)
asn → ∞. SincemN = N−1 trD−1 − wmN , we have
mN − m0N = γ(eN − e0
N ) − wmN (136)
where now
γ =
∫cNτ2
(1 + cNτeN )(1 + cNτe0N )
dFT(τ)tr D−1RD0−1
N(137)
From (72) and (80) we get|γ| ≤ cN |z|2v−4 log3 N . Therefore, from (88) and (117), we get (135).
Returning to the original assumptions onX11, T, andR, for each of a countably infinite collection ofz with
positive imaginary part, possessing a cluster point with positive imaginary part, we have (135). Therefore, by Vitali’s
convergence theorem, page 168 of [16], for anyε > 0 we have with probability onemN (z)−m0N(z) → 0 uniformly
in any region ofC bounded by a contour interior to
C \ ({z : |z| ≤ ǫ} ∪ {z = x + iv : x > 0, |v| ≤ ε}) (138)
If S = f(R), meaning the eigenvalues ofR are changed viaf in the spectral decomposition ofR, then we have
m0N (z) =
∫1
f(r) + r∫
τ1+cN τe0
N(z)
dFT(τ) − zdFR(r) (139)
e0N (z) =
∫r
f(r) + r∫
τ1+cN τe0
N(z)
dFT(τ) − zdFR(r) (140)
E. Extension toK ≥ 1
Suppose now
BN = S +
K∑
k=1
R12
k XkTkXH
kR12
k (141)
where K remains fixed,Xk is N × nk satisfying 1, theXk ’s are independent,Rk satisfies 2) and 4),Tk is
nk × nk satisfying 3) and 4),ck = N/nk satisfies 6), andS satisfies 5). After truncation and centralization we
may assume the same condition on the entries of theXk ’s, and the spectral norms of theRk ’s and theTk ’s. Write
yk,j = (1/√
nk)R12
k xk,j , with xk,j denoting thej-th column ofXk, and letτk,j denote thej-th diagonal element
of Tk. Then we can write
BN = S +
K∑
k=1
nk∑
j=1
τk,jyk,jyH
k,j (142)
32
Define
eN,k = eN,k(z) = (1/N) trRk(BN − zIN )−1 (143)
and
pk = − 1
nkz
nk∑
j=1
τk,j
1 + ckτk,jeN,k(144)
=
∫ −τk
1 + ckτkeN,kdFTk(τk) (145)
We seeeN,k and pk have the same properties aseN and pN . Let Bk,(j) = BN − τk,jyk,jyH
k,j . Define D =
−zIN + S −∑K
k=1 zpk(z)Rk. We write
BN − zIN − D =
K∑
k=1
nk∑
j=1
τk,jyk,jyH
k,j + zpk(z)Rk
(146)
Taking inverses and using Lemma 4, we have
D−1 − (BN − zIN )−1 =K∑
k=1
nk∑
j=1
τk,jD−1yk,jy
H
k,j(BN − zIN )−1 + zpkD−1Rk(BN − zIN)−1
(147)
=
K∑
k=1
nk∑
j=1
τk,j
D−1yk,jyH
k,j(Bk,(j) − zIN)−1
1 + τk,jyH
k,j(Bk,(j) − zIN )−1yk,j+ zpkD
−1Rk(BN − zIN )−1
(148)
Taking traces and dividing byN , we have
(1/N) trD−1 − mN(z) =
K∑
k=1
1
nk
nk∑
j=1
τk,jdk,j ≡ wmN (149)
where
dk,j =(1/N)xH
k,jR12
k (Bk,(j) − zIN )−1D−1R12
k xk,j
1 + τk,jyH
k,j(Bk,(j) − zIN)−1yk,j− (1/N) trRk(BN − zIN )−1D−1
1 + ckτk,jeN,k(150)
For a fixedk ∈ {1, . . . , K}, we multiply the above matrix identity byRk, take traces and divide byN . Thus we
get
(1/N) trD−1Rk − ek(z) =
K∑
k=1
1
nk
nk∑
j=1
τk,jdekkj ≡ we
k (151)
where
dekkj =
(1/N)xH
k,jR12
k (Bk,(j) − zIN)−1RkD−1R
12
k xk,j
1 + τk,jyH
k,j(Bk,(j) − zIN)−1yk,j− (1/N) trRk(BN − zIN)−1RkD
−1
1 + ckτk,jeN,k(152)
In exactly the same way as in the case withK = 1 we find that for any nonnegativeℓ, logℓ NwmN and the
logℓ wei ’s converge almost surely to zero. By considering block diagonal matrices as before withN , ni’s, S, Ri’s
andTi’s all fixed we find that there existe01, . . . , e
0K with positive imaginary parts for which for eachi
e0i =
1
NtrRi
(S +
K∑
k=1
[∫τ
1 + ckτe0k
dFTk(τ)
]Rk − zIN
)−1
(153)
33
Let us verify uniqueness. Lete0 = (e01, . . . , e
0K)T, and letD0 denote the matrix in (153) whose inverse is taken
(essentiallyD after theeN,i’s are replaced by thee0i ’s). Let for eachj, e0
j,2 = ℑe0j , ande0
2 = (e01,2, . . . , e
0K,2)
T. Then,
noticing that for eachi, trSD0−HRiD
0−1is real and nonnegative (positive wheneverS 6= 0) andtrD0−H
RiD0−1
and trRjD0−H
RiD0−1
are real and positive for alli, j, we have
e0i,2 = ℑ
1
Ntr
S +
K∑
j=1
[∫τ
1 + cjτe0j
dFTj (τ)
]Rj − z∗I
D0−H
RiD0−1
(154)
=
K∑
j=1
e0j,2
1
NtrRjD
0−H
RiD0−1
cj
∫τ2
|1 + cjτe0j |2
dFTj (τ) +v
NtrD0−H
RiD0−1
(155)
Let C0 = (c0ij), b0 = (b0
1, . . . , b0N )T, where
c0ij =
1
NtrRjD
0−H
RiD0−1
cj
∫τ2
|1 + cjτe0j |2
dFTj (τ) (156)
and
b0i =
1
NtrD0−H
RiD0−1
(157)
Therefore we havee02 satisfies
e02 = C0e0
2 + vb0 (158)
We see that eache0j,2, c0
ij , andb0j are positive. Therefore, from Lemma 9 we haveρ(C0) < 1.
Let e0 = (e01, . . . , e
0K)T be another solution to (153), withe0
2, D0, C0 = (c0ij), b0 defined analogously, so that
(158) holds andρ(C0) < 1. We have for eachi,
e0i − e0
i =1
NtrRiD
0−1K∑
j=1
(e0j − e0
j)cj
∫τ2
(1 + cjτe0j )(1 + cjτe0
j)dFTj (τ)RjD
0−1(159)
Thus withA = (aij) where
aij =1
NtrRiD
0−1RjD
0−1cj
∫τ2
(1 + cjτe0j )(1 + cjτe0
j)dFTj (τ) (160)
we have
e0 − e0 = A(e0 − e0) (161)
which means, ife0 6= e0, thenA has an eigenvalue equal to 1.
Applying Cauchy-Schwarz we have
|aij | ≤(
1
NRiD
0−1RjD
0−H
∫τ2
|1 + cjτe0j |2
dFTj (τ)
) 12(
1
NRiD
0−1RjD
0−H
∫τ2
|1 + cjτe0j |2
dFTj (τ)
) 12
(162)
= c0ij
1/2c0ij
1/2(163)
Therefore from Lemmas 10 and 11 we get
ρ(A) ≤ ρ(c0ij
12 c0
ij
12 ) ≤ ρ(C0)
12 ρ(C0)
12 < 1 (164)
34
a contradiction to the statementA has an eigenvalue equal to 1. Consequently we havee = e.
Let eN = (eN,1, . . . , eN,K)T ande0N = (e0
N,1, . . . , e0N,K) denote the vector solution to (153) for eachN . We
will show for anyℓ > 0, almost surely
limN→∞
logℓ N(eN − e0N ) → 0 (165)
We have
e0N = (
1
NtrR1D
0−1, . . . ,
1
NtrRKD0−1
)T (166)
Let we = weN = −(we
1, . . . , weK)T. Then we can write
eN = (1
NtrR1D
−1, . . . ,1
NtrRKD−1)T + we (167)
Therefore
eN − e0N = A(N)(eN − e0
N ) + we (168)
whereA(N) = (aij(N)) with
aij(N) =1
NtrRiD
−1RjD0−1
cj
∫τ2
(1 + cjτeN,j)(1 + cjτe0N,j)
dFTj (τ) (169)
We let e0N,2, b0
ij(N), C0(N), b0N,i, andb0
N , denote the quantities from above, reflecting now their dependence on
N . Let C(N) = (cij(N)) be K × K with
cij(N) =1
NtrRjD
−HRiD−1cj
∫τ2
|1 + cjτeN,j |2dFTj (τ) (170)
Let eN,2 = ℑ[eN ] andwe2 = ℑ[we]. DefinebN = (bN,1, . . . , bN,K)T with
bN,i =1
NtrD−HRiD
−1 (171)
Then, as above we find that
eN,2 = C(N)eN,2 + vbN + we2 (172)
Using (72) and (80) we see there exists a constantK1 > 0 for which
c0ij(N) ≤ K1 log3 Nb0
N,i (173)
and
cij(N) ≤ K1 log3 NbN,i (174)
cij(N) ≤ K1 log4 N (175)
for eachi, j. Therefore, from (158) we see there existsK > 0 for which
e0N,i ≤ K log4 Nvb0
N,i (176)
Let x be such thatxT is a left eigenvector ofC0(N) corresponding to eigenvalueρ(C0(N)), guaranteed by Lemma
12. Then from (172) we have
xTe0N,2 = ρ(C0(N))xTe0
N,2 + vxTb0N (177)
35
Using (177) we have
1 − ρ(C0(N)) =vxTb0
N
xTe0N,2
≥ (K log4 N)−1 (178)
Fix an ℓ > 0 and consider a realization for whichlogℓ+3+p NweN → 0, asN → ∞, wherep ≥ 12K − 7. We
will show for all N large
ρ(C(N)) ≤ 1 + (K log4 N)−1 (179)
For eachN we rearrange the entries ofeN,2, vbm + we2, and C(n) depending on whether theith entry of
vbm + we2 is greater than, or less than or equal to zero. We can therefore assume
C =
C11(N) C12(N)
C21(N) C22(N)
(180)
whereC11(N) is k1×k1, C22(N) is k2×k2, C12(N) is k1×k2, andC21(N) is k2×k1. From Lemma 9 we have
ρ(C11(N)) < 1. If vbN,i + we2,i ≤ 0, then necessarilyvbN,i ≤ |we
N | ≤ K1(log n)−(3+p), and so from (176) we
have the entries ofC21(N) andC22(N) bounded byK1(log N)−p. We may assume for allN large0 < k1 < K,
since otherwise we would haveρ(C(N)) < 1.
We seek an expression fordet(C(N)−λIN ) in which Lemma 14 can be used. We considerN large enough so
that, for |λ| ≥ 1/2, we have(C22(N) − λIN )−1 existing with entries uniformly bounded. We have
det(C(N) − λI) = det
I −C12(N)(C22(N) − λI)−1
0 I
C11(N) − λI C12(N)
C21(N) C22(N) − λI
(181)
= det
C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N) 0
C21(N) C22(N) − λI
(182)
= det(C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N)) det(C22(N) − λI) (183)
We see then that forλ = ρ(C(N)) real and greater than1,
det(C11(N) − λI − C12(N)(C22(N) − λI)−1C21(N)) (184)
must be zero.
Notice that from (176), the entries ofC12(N)(C22(N)− λI)−1C21(N) can be made smaller than any negative
power oflog N for p sufficiently large. Notice also that the diagonal elements of C11(N) are all less than1. From
this, Lemma 13 and (176), we see thatρ(C(N)) ≤ K1 log4 N . The determinant in (184) can be written as
det(C11(N) − λI) + g(λ) (185)
Whereg(λ) is a sum of products, each containing at least one entry fromC12(N)(C22(N)−λI)−1C21(N). Again,
from (176) we see that for all|λ| ≥ 1/2, g(λ) can be made smaller than any negative power oflog N by making
p sufficiently large. Choosep so that|g(λ)| < (K log N)−4k1 for theseλ. It is clear that anyp > 8k1 + 4 will
36
suffice. Letλ1, . . . , λk1 denote the eigenvalues ofC11. Sinceρ(C11) < 1, we see that for|λ| ≥ (K log N)−4, we
have
| det(C11(N) − λI)| = |k1∏
i=1
(λi − λ)| (186)
> (K log N)−4k1 (187)
Thus withf(λ) = det(C11(N)− λI), a polynomial, andg(λ) being a rational function, we have the conditions of
Lemma 14 being met on any rectangleC, with vertical lines going through((K log N)−4, 0) and(K1(log N)4, 0).
Therefore, sincef(λ) has no zeros insideC, neither doesdet(C(N) − λI). Thus we get (179).
As before we see that
|aij(N)| ≤ c1/2ij (N)c0
ij1/2
(N) (188)
Therefore, from (178), (179), and Lemmas 10 and 11, we have for all N large
ρ(A(N)) ≤(
K2 log8 N − 1
K2 log8 N
) 12
(189)
For theseN we have thenI − A(N) invertible, and so
eN − e 0N = (I − A(N))−1we (190)
By (72) and (80) we have the entries ofA(N) bounded byK1 log4 N . Notice also, from (189)
| det(I − A(N))| ≥ (1 − ρ(A(N))K ≥
K2 log8 N
(1 +
K2 log8 N − 1
K2 log8 N
) 12
−K
≥ (2K2 log8 N)−K (191)
When considering the inverse of a square matrix in terms of its adjoint divided by its determinant, we see that
the entries of(I − A(N))−1 are bounded by
(K − 1)!K1(log N)4(K−1)
| det(I − A(N))| ≤ K3(log N)12K−4 (192)
Therefore, sincep ≥ 12K−7 (> 8k1 +4), (165) follows on this realization, an event which occurs with probability
one.
Letting m0N = 1
N trD0−1, we have
mN − m0N = ~γT(eN − e0
N) (193)
where~γ = (γ1, . . . , γK)T with
γj =
∫cNτ2
(1 + cNτeN,j)(1 + cnτe0N,j)
dFTN (τ)tr D−1RjD
0−1
N(194)
From (72) and (80) we get each|γj | ≤ cN |z|2v−4 log3 N . Therefore from (165) and the fact thatwmN → 0, we
have
mN − m0N → 0 (195)
almost surely, asN → ∞.
This completes the proof.
37
APPENDIX B
PROOF OFTHEOREM 2
We first prove thatV(0)(x) as defined in Equation (22) verifies
V(0)(x) =
∫ ∞
x
(1
w− m
(0)N (−w)
)dw (196)
and then we prove that, under the conditions of Theorem 2,V(0)(x) defined as such verifies
V(0)(x) − V(x)
a.s.−→ 0 (197)
A. Proof of (196)
First, observe that we can rewriteei(z) under the symmetric form,
ei(z) =1
NtrRi
(−z
[IN +
K∑
k=1
δkRk
])−1
(198)
δi(z) =1
nitrTi (−z [Ini
+ ciei(z)Ti])−1 (199)
and then form(0)N (z),
m(0)N (z) =
1
Ntr
(−z
[IN +
K∑
k=1
δkRk
])−1
(200)
Now, notice that
1
z− m
(0)N (−z) =
1
N
(zI)
−1 −(
z
[IN +
K∑
k=1
δkRk
])−1 (201)
=
K∑
k=1
δk(−z) · ek(−z) (202)
Since the Shannon transformV(x) satisfiesV(x) =∫ +∞
x[w−1 − mN (−w)]dw, we need to find an integral form
for∑K
k=1 δk(−z) · ek(−z). Notice now that
d
dz
1
Nlog det
(IN +
K∑
k=1
δk(−z)Rk
)= −z
K∑
k=1
ek(−z) · δ′k(−z)
d
dz
1
Nlog det (Ink
+ ckek(−z)Tk) = −z · e′k(−z) · δk(−z) (203)
d
dz
(z
K∑
k=1
δk(−z)ek(−z)
)=
K∑
k=1
δk(−z)ek(−z) − z
K∑
k=1
δ′k(−z) · ek(−z) + δk(−z) · e′k(−z) (204)
Combining the last three lines, we have
K∑
k=1
δk(−z)ek(−z) =
d
dz
[− 1
Nlog det
(IN +
K∑
k=1
δk(−z)Rk
)−
K∑
k=1
1
Nlog det (Ink
+ ckek(−z)Tk) + z
K∑
k=1
δk(−z)ek(−z)
](205)
38
which after integration leads to∫ +∞
z
(1
w− m
(0)N (−w)
)dw =
1
Nlog det
(IN +
K∑
k=1
δk(−z)Rk
)+
K∑
k=1
1
Nlog det (Ink
+ ckek(−z)Tk) − z
K∑
k=1
δk(−z)ek(−z) (206)
which is exactly the right-hand side of (22).
B. Proof of (197)
Consider now the existence of a nonrandomα and for eachN a non-negative integerrN for which
maxi≤K
max(λTi
rN +1, λRi
rN+1) ≤ α (207)
(eigenvalues also arranged in non-increasing order). Thenfor eachi
λR
12i
XiTiXH
i R12i
2rN +1 = (sR
12i
XiT12i
2rN+1 )2 (208)
≤ α2‖XiXH
i ‖ (209)
and then we have, from Lemma 15,
λBN
2KrN +1 ≤ α2(‖X1XH
1 ‖ + · · · + ‖XKXH
K‖) (210)
We can in fact consider that the spectral norms of theXi are bounded in the limit. Either Gaussian assumptions
on the components, or finite fourth moment, but all coming from doubly infinite arrays (remember though that we
need the right-unitary invariance structure ofXi). Because of assumption 5 in Corollary 1, we can, by enlarging
the sample space, assume eachXi is embedded in anN × n′i matrix X′
i, whereN/n′i → a as N → ∞. Then,
with probability one (see e.g. [33]),
lim supN
λBN
2KrN+1 ≤ lim supN
α2(‖X′1X
′1H‖ + · · · + ‖X′
KX′K
H‖)
≤ α2K(b/a)(1 +√
a)2 (211)
Let a0 be any real greater thanα2K(b/a)(1 +√
a)2.
SinceS = 0 here, it follows as in [4] that{FBn} is almost surely tight. LetF 0N denote the distribution function
having Stieltjes transformm(0)N , and letf on [0,∞) be a continuous function. Then the function
fa0(x) =
f(x) , x ≤ a0
f(a0) , x > a0(212)
is bounded and continuous. Therefore, with probability1,∫
fa0(x)dFBN (x) −∫
fa0(x)dF 0N (x) → 0 (213)
asN → ∞.
39
Suppose nowrN = o(N). Then, since almost surely there are at most2KrN eigenvalues greater thana0 for all
N large, any converging subsequence of{F 0N} must have some mass lying on[0, a0]. This implies, with probability
1,1
N
∑
λi≤a0
f(λi) −∫
[0,a0]
f(x)dF 0N (x) → 0 (214)
asN → ∞.
Let bN be a bound on the spectral norms of theTi andRi. Then
‖Bn‖ ≤ b2N(‖X′
1X′1H‖ + · · · + ‖X′
KX′K
H‖) (215)
Fix a numberβ > K(b/a)(1+√
a)2, and letaN = b2Nβ. Suppose also thatf is increasing and thatf(aN )rN =
o(N). Then ∫f(x)dFBn(x) − 1
N
∑
λi≤a0
f(λi) → 0 (216)
almost surely, asN → ∞. Therefore, with probability1,∫
f(x)dFBN (x) −∫
[0,a0]
f(x)dF 0N (x) → 0 (217)
asN → ∞.
For anyN we consider, forj = 1, 2, . . ., the jN × jN matrix BN,j formed, as before, from block diagonal
matrices andjN × jni matrices of i.i.d. variables. Then with probability1, FBN,j converges weakly toF 0N as
j → ∞. Properties on the eigenvalues ofBN,j will thus yield properties ofF 0N .
By considering the bound on‖Bn,j‖ analogous to (215), we must haveF 0N (aN ) = 1 for all N large.
Similar to (211) we see that, with probability1
lim supj
λBN,j
2KjrN +1 ≤ a2((1 +√
c1)2 + · · · + (1 +
√cK)2) (218)
this latter number being less thana0 for all N large.
At this point we will use the fact that for probability measuresPN , P on R with PN converging weakly toP ,
we have (see e.g. [34])
lim infN
PN (G) ≥ P (G) (219)
for any open setG. Thus, withG = (a0,∞) we see that, with probability1, for all N large
F 0N ((a0,∞)) = 1 − F 0
N (a0) ≤ lim infj
FBN,j ((a0,∞)) (220)
≤ 2KrN/N (221)
Therefore, for allN large ∫
(a0,∞)
f(x)dF 0N (x) ≤ f(aN)2KrN/N → 0 (222)
asN → ∞.
40
Therefore, we conclude that,∫
f(x)dF 0N (x) is bounded, and with probability1
∫f(x)dFBN (x) −
∫f(x)dF 0
N (x) → 0 (223)
asN → ∞. This concludes the proof.
APPENDIX C
PROOF OFPROPOSITION1
The proof stems from the following result,
Proposition 3: f(P1, . . . ,PK) is a strictly concave matrix in the Hermitian nonnegative definite matricesP1, . . . ,PK ,
if and only if, for any couples(P1a,P1b
), . . . , (PKa,PKb
) of Hermitian nonnegative definite matrices, the function
φ(λ) = f (λP1a+ (1 − λ)P1b
, . . . , λPKa+ (1 − λ)PKb
) (224)
is strictly concave.
Let us use a similar notation as in (31) of the capacity,
I(λ) = I(λP1a+ (1 − λ)P1b
, . . . , λP|S|a + (1 − λ)P|S|b) (225)
and consider a set(δk, ek,P1, . . . ,P|S|) which satisfies the system of equations (31)-(33). Then, from remark (36)
and (37),
dI
dλ=∑
k∈S
∂V
∂δk
∂δk
∂λ+
∂V
∂ek
∂ek
∂λ+
∂V
∂λ(226)
=∂V
∂λ(227)
where
V : (δ1, . . . , δ|S|, e1, . . . , e|S|, λ) 7→ I(λ) (228)
Mere derivations ofV lead then to
∂2V
∂λ2= −
∑
i∈S
(c2i e
2i )
1
Ntr (I + cieiRiPi)
−2(Ri(Pia
− Pib))2 (229)
Since ei > 0 on the strictly negative real axis, if any of theRi’s is positive definite, then, for all nonnegative
definite couples(Pia,Pib
), such thatPia6= Pib
, I ′′ < 0. Then, from Proposition 3, the deterministic approximate
on the right-hand side of (27) is strictly concave inP1, . . . ,P|S| if any of theRi matrices is invertible.
APPENDIX D
USEFUL LEMMAS
In this section, we gather most of the known or new lemmas which are needed in various places in Proof A.
The statements in the following Lemma are well-known
Lemma 1: 1) For rectangular matricesA, B of the same size,
rank(A + B) ≤ rank(A) + rank(B) (230)
41
2) For rectangular matricesA, B for which AB is defined,
rank(AB) ≤ min(rank(A), rank(B)) (231)
3) For rectangularA, rank(A) is less than the number of non-zero entries ofA.
Lemma 2: (Lemma 2.4 of [4]) ForN × N Hermitian matricesA andB,
‖FA − FB‖ ≤ 1
Nrank(A − B) (232)
From these two lemmas we get the following.
Lemma 3:Let S, A, A, be HermitianN × N , Q, Q both N × n, andB, B both Hermitiann × n. Then
1)
‖FS+AQBQHA − FS+AQBQHA‖ ≤ 2
Nrank(Q− Q) (233)
2)
‖FS+AQBQHA − FS+AQBQHA‖ ≤ 2
Nrank(A − A) (234)
and
3)
‖FS+AQBQHA − FS+AQBQHA‖ ≤ 1
Nrank(B− B) (235)
Lemma 4:For N × N A, τ ∈ C andr ∈ CN for which A andA + τrrH are invertible,
rH(A + τrrH)−1 =1
1 + τrHA−1rrHA−1 (236)
This result follows fromrHA−1(A + τrrH) = (1 + τrHA−1r)rH.
Moreover, we recall Lemma 2.6 of [4]
Lemma 5:Let z ∈ C+ with v = ℑ[z], A andB N × N with B Hermitian, andr ∈ C
N . Then
∣∣tr((B − zIN)−1 − (B + rrH − zIN )−1
)A∣∣ =
∣∣∣∣rH(B − zIN )−1A(B − zIN)−1r
1 + rH(B − zIN)−1r
∣∣∣∣ ≤‖A‖
v. (237)
From Lemma 2.2 of [17], and Theorems A.2, A.4, A.5 of [18], we have the following
Lemma 6: If f is analytic onC+, both f(z) and zf(z) map C+ into C+, and there exists aθ ∈ (0, π/2) for
which zf(z) → c, finite, asz → ∞ restricted to{w ∈ C : θ < argw < π − θ}, thenc < 0 andf is the Stieltjes
transform of a measure on the nonnegative reals with total mass−c.
Also, from [4], we need
Lemma 7:Let y = (y1, . . . , yN)T with the yi’s i.i.d. such thatEy1 = 0, E|y1|2 = 1 andy1 ≤ log N , andA an
N × N matrix, then
E|yHAy|6 ≤ K‖A‖6N3 log12 N (238)
whereK does not depend onN , A, nor on the distribution ofy1.
Additionally, we need
42
Lemma 8:Let D = A + iB + ivI, whereA, B are N × N Hermitian,B is also positive semi-definite, and
v > 0. Then‖D−1‖ ≤ v−1.
Proof: We haveDDH = (A + iB)(A − iB) + v2I + 2vB. Therefore the eigenvalues ofDDH are greater or
equal tov2, which implies the singular values ofD are greater or equal tov, so that the singular values ofD−1
are less or equal tov−1. We therefore get our result.
From Theorem 2.1 of [28],
Lemma 9:Let ρ(C) denote the spectral radius of theN ×N matrix C (the largest of the absolute values of the
eigenvalues ofC). If x,b ∈N with the components ofC, x, andb all positive, then the equationx = Cx + b
implies ρ(C) < 1.
From Theorem 8.1.18 of [29],
Lemma 10:SupposeA = (aij) andB = (bij) areN × N with bij nonnegative and|aij | ≤ bij . Then
ρ(A) ≤ ρ((|aij |)) ≤ ρ(B) (239)
Also, from Lemma 5.7.9 of [30],
Lemma 11:Let A = (aij) andB = (bij) be N × N with aij , bij nonnegative. Then
ρ((a12
ijb12
ij)) ≤ (ρ(A))12 (ρ(B))
12 (240)
And, Theorems 8.2.2 and 8.3.1 of [29],
Lemma 12:If C is a square matrix with nonnegative entries, thenρ(C) is an eigenvalue ofC having an
eigenvectorx with nonnegative entries. Moreover, if the entries ofC are all positive, thenρ(C) > 0 and the entries
of x are all positive.
From [30], we also need Theorem 6.1.1,
Lemma 13: Gervsgorin’s TheoremAll the eigenvalues of anN × N matrix A = (aij) lie in the union of the
N disks in the complex plane, theith disk having centeraii and radius∑
j 6=i |aij |.Theorem 3.42 of [16],
Lemma 14: Rouche’s TheoremIf f(z) andg(z) are analytic inside and on a closed contourC of the complex
plane, and|g(z)| < |f(z)| on C, thenf(z) andf(z) + g(z) have the same number of zeros insideC.
In order to prove Theorem 2, we also need, from [32]
Lemma 15:Consider a rectangular matrixA and letsAi denote theith largest singular value ofA, with sA
i = 0
wheneveri > rank(A). Let m, n be arbitrary non-negative integers. Then forA, B rectangular of the same size
sA+Bm+n+1 ≤ sA
m+1 + sBn+1 (241)
And for A, B rectangular for whichAB is defined
sABm+n+1 ≤ sA
m+1sBn+1 (242)
As a corollary, for any integerr ≥ 0 and rectangular matricesA1, . . . ,AK , all of the same size,
sA1+···+AK
Kr+1 ≤ sA1r+1 + · · · + sAK
r+1 (243)
43
ACKNOWLEDGMENT
Silverstein’s work is supported by the U.S. Army Research Office under Grant W911NF-05-1-0244. Debbah’s
work is partially supported by the European Commission in the framework of the FP7 Network of Excellence in
Wireless Communications NEWCOM++.
The authors would like to thank Walid Hachem for the fruitfuldiscussions we shared concerning some results
of this paper.
REFERENCES
[1] S. Sesia, I. Toufik and M. Baker, “LTE: The UMTS Long Term Evolution, From Theory to Practice,” Wiley and Sons, 2009.
[2] A. M. Tulino, A. Lozano and S. Verdu, “Impact of antenna correlation on the capacity of multiantenna channels,” IEEE Trans. on
Information Theory, vol. 51, no. 7, pp. 2491-2509, 2005.
[3] M. J. M. Peacock, I. B. Collings and M. L. Honig, “Eigenvalue distributions of sums and products of large random matrices via incremental
matrix expansions,” IEEE Trans. on Information Theory, vol. 54, no. 5, pp. 2123-2138, 2008.
[4] J. Silverstein and Z. Bai, “On the empirical distribution of eigenvalues of a class of large dimensional random matrices,” Journal of
Multivariate Analysis, vol. 54, issue 2, pp. 175–192, 1995
[5] G. Taricco, E. Riegler, “On the Ergodic Capacity of the Asymptotic Separately-Correlated Rician Fading MIMO Channel with Interference,”
IEEE International Symposium on Information Theory, pp. 531-535, 2007
[6] G. Foschini and M. Gans, “On Limits of Wireless Communications in a Fading Environment when Using Multiple Antennas,” Wireless
Personal Communications, vol. 6, no. 3, pp. 311-335, 1998.
[7] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European transactions on telecommunications, vol. 10, no. 6, pp. 585-595,
1999.
[8] A. Paulraj, R. Nabar and D. Gore, “Introduction to Space-Time Wireless Communications,” Cambridge University Press, 2003.
[9] C. Chuah, D. Tse, J. Kahn, and R. Valenzuela, “Capacity Scaling in MIMO Wireless Systems under Correlated Fading,” IEEE Trans. on
Information Theory, pp. 637650, 2002.
[10] A. Moustakas, S. Simon and A. Sengupta, “MIMO Capacity Through Correlated Channels in the Presence of Correlated Interferers and
Noise: A (Not So) LargeN Analysis,” IEEE Trans. on Information Theory, vol. 49, no. 10, 2003.
[11] C. K. Wen, Y. N. Lee, J. T. Chen, and P. Ting, “Asymptotic spectral efficiency of MIMO multiple-access wireless systems exploring only
channel spatial correlations,” IEEE Transactions on Signal Processing, vol. 53, no. 6, pp. 2059-2073, 2005.
[12] D. Guo and S. Verdu, “Multiuser Detection and Statistical Physics,” Communications on Information and Network Security, Kluwer
Academic Publishers, 2003.
[13] S. Shamai and A. Wyner, “Information-Theoretic Considerations for Symmetric, Cellular, Multiple-Access FadingChannels - Part I,” IEEE
Trans. on Information Theory, vol. 43, no. 6, 1997.
[14] B. Zaidel, S. Shamai and S. Verdu, “Multicell Uplink Spectral Efficiency of Coded DS-CDMA With Random Signatures,”IEEE Journal
on Selected Areas in Communications, vol. 19, no. 8, 2001.
[15] Z. Bai and J. Silverstein, “On the signal-to-interference-ratio of CDMA systems in wireless communications,” Annals of Applied Probability
vol. 17 no. 1, pp. 81-101, 2007.
[16] E. C. Titchmarsh, “The Theory of Functions,” Oxford University Press, 1939.
[17] J. A. Shohat and J. D. Tamarkin, “The Problem of Moments,” American Mathematical Society, Providence, RI, 1970.
[18] M. K. Krein and A. A. Nudelman, “The Markov Moment Problem and Extremal Problems,” American Mathematical Society, Providence,
RI, 1997.
[19] R. B. Dozier and J. W. Silverstein, “On the Empirical Distribution of Eigenvalues of Large Dimensional Information-Plus-Noise Type
Matrices,” Journal of Multivariate Analysis 98(4), pp. 678-694, 2007.
[20] S. Vishwanath, N. Jindal and A. Goldsmith, “Duality, Achievable Rates, and Sum-Rate Capacity of Gaussian MIMO Broadcast Channels,”
IEEE Trans. on Information Theory, vol. 49, no. 10, 2003.
44
[21] P. Viswanath and D. Tse, “Sum Capacity of the Multiple Antenna Gaussian Broadcast Channel and Uplink-Downlink Duality,” IEEE
Trans. on Information Theory, vol. 49, no. 8, pp. 19121923, 2003.
[22] S. Verdu, “Multiple-access channels with memory withand without frame synchronism,” IEEE Trans. on InformationTheory, vol. 35, pp.
605-619, 1989.
[23] H. Weingarten, Y. Steinberg and S. Shamai, “The Capacity Region of the Gaussian Multiple-Input Multiple-Output Broadcast Channel,”
IEEE Trans. on Information Theory, vol. 52, no. 9, pp. 3936-3964, 2006.
[24] A. Soysal and S. Ulukus “Optimality of Beamforming in Fading MIMO Multiple Access Channels,”to be published.
[25] A.M. Tulino, A. Lozano and S. Verdu, “Impact of antenna correlation on the capacity of multiantenna channels,” IEEETrans. on Information
Theory, vol. 51, no. 7, pp. 2491-2509, 2005.
[26] W. Hachem, Ph. Loubaton and J. Najim, “Deterministic Equivalents for Certain Functionals of Large Ranfom Matrices,” The Annals of
Applied Probability, Vol. 17, No. 3, 875930, 2007.
[27] J. Dumont, S. Lasaulce, W. Hachem, Ph. Loubaton and J. Najim, “On the Capacity Achieving Covariance Matrix for Rician MIMO
Channels: An Asymptotic Approach”, submitted to IEEE Trans. on Information Theory, Octggober 2007, revised in October2008.
[28] E. Seneta, “Non-negative Matrices and Markov Chains,”Second Edition, Springer Verlag New York, 1981.
[29] R.A. Horn and C.R. Johnson, “Matrix Analysis,” Cambridge University Press, 1985.
[30] R.A. Horn and C.R. Johnson, “Topics in Matrix Analysis,” Cambridge University Press, 1991.
[31] Z. D. Bai and Y. Q. Yin, “Limit of the smallest eigenvalueof a large dimensional sample covariance matrix,” The annals of Probability,
pp. 1275-1294, Institute of Mathematical Statistics, 1993.
[32] K. Fan, “Maximum properties and inequalities for the eigenvalues of completely continuous operators,” Proc. Nat.Acad. Sci. U.S.A. 37
760-766, 1951.
[33] Z. Bai and J. W. Silverstein, “Spectral Analysis of Large Dimensional Random Matrices,” Science Press, Beijing, 2006.
[34] P. Billingsley, “Convergence of Probability Measures,” John Wiley & Sons, Second Revised Edition, 1999.
[35] M. Debbah and R. R. Muller, “MIMO channel modeling and the principle of maximum entropy,” IEEE Transactions on Information
Theory, vol. 51, no. 5, pp. 1667-1690, 2005.