dynamic proﬁt maximization of cognitive mobile virtual ... · secondary operator obtains...

arX

iv:1

212.

3979

v1 [

cs.N

I] 1

7 D

ec 2

012

1

Dynamic Profit Maximization of Cognitive MobileVirtual Network Operator

Shuqin Li, Member, IEEE, Jianwei Huang, Senior Member, IEEE, and Shuo-Yen Robert Li, Fellow, IEEE

Abstract —We study the profit maximization problem of a cognitive virtual network operator in a dynamic network environment. Weconsider a downlink OFDM communication system with various network dynamics, including dynamic user demands, uncertain sensingspectrum resources, dynamic spectrum prices, and time-varying channel conditions. In addition, heterogenous users and imperfectsensing technology are incorporated to make the network model more realistic. By exploring the special structural of the problem,we develop a low-complexity on-line control policies that determine pricing and resource scheduling without knowing the statistics ofdynamic network parameters. We show that the proposed algorithms can achieve arbitrarily close to the optimal profit with a propertrade-off with the queuing delay.

Index Terms —Cognitive Radio, Profit Maximization, Pricing, Virtual Network Operator.

✦

1 INTRODUCTION

The limited wireless spectrum is becoming a bottleneck formeeting today’s fast growing demands for wireless data ser-vices. More specifically, there is very little spectrum leftthatcan be licensed to new wireless services and applications.However, extensive field measurements [2] showed that muchof the licensed spectrum remains idle most of the time, evenin densely populated metropolitan areas such as New YorkCity and Chicago. A potential way to solve this dilemma is tomanage and utilize the licensed spectrum resource in a moreefficient way.

This is why the concept ofDynamic Spectrum Access(DSA) has received enthusiastic support from governmentsand industries worldwide [3]–[5]. We can roughly classifyvarious DSA approaches into two main categories: thespec-trum sensing based ones and thespectrum leasing (or market)based ones. The first category indicates a hierarchical accessmodel, where unlicensed secondary users opportunisticallyaccess the under-utilized part of the licensed spectrum, withcontrolled interference to the licensed primary users. Duringthis process, spectrum sensing helps the secondary users todetect the currently available spectrum resource. In contrast,the second category relates to a dynamic exclusive use model,which allows licensees to trade spectrum usage right to thesecondary users. In both categories, it is possible to have a

• Shuqin Li is with Research and Innovation, Alcatel-Lucent Shanghai BellCo., Ltd., D400, Bldg. 3, 388 Ningqiao Rd, Shanghai, 201206, China. E-mail: [email protected] This work was done when Shuqin Liwas in The Chinese University of Hong Kong.

• Jianwei Huang is with Department of Information Engineering, The Chi-nese University of Hong Kong. E-mail: [email protected]. JianweiHuang is the corresponding author.

• Shuo-Yen Robert Li is with both Department of Information Engineeringand Institute of Network Coding, The Chinese University of Hong Kong.E-mail: [email protected].

Part of the results have appeared in IEEE ICC 2012 [1]. This work is sup-ported by the General Research Funds (Project Number CUHK 412710 andCUHK 412511) and AoE established under the University Grant Committeeof the Hong Kong Special Administrative Region, China, and grants fromChina 973 Prog. No.2012CB315901 & 2012CB315904.

secondary operator coordinating the transmissions of multiplesecondary users.

There are pros and cons for both DSA categories. Spectrumsensing detects and identifies the available unused licensedspectrum through technologies such as beacons, geolocationsystem, and cognitive radio. Form the secondary operator’sperspective, the spectrum acquired by sensing isan unreliableresource, since it cannot determine how much resource isavailable before sensing. Furthermore, imperfect sensingmaylead to collisions with primary users, and thus reduce theincentives for the licensee to share the spectrum. Thereforethe secondary operator needs to carefully design sensing andaccess algorithm tocontrol the collision probability under anacceptable level. In dynamic spectrum leasing, a secondaryoperator acquires the exclusive right to use spectrum withina limited time period by paying the corresponding leasingprice. Thus the spectrum acquired by spectrum leasing is areliable resource. However, the cost can be high comparedto the spectrum sensing cost, and is dynamically changingaccording to the demand and supply relationship in the market.

In this paper, we will consider a hybrid model, where asecondary operator obtains resources from the primary li-censees through bothspectrum sensing anddynamic spectrumleasing, and provides services to the secondary unlicensedusers. Our study is motivated by [6], [7], in which theauthors introduced the new concept of Cognitive MobileVirtual Network Operator (C-MVNO). The C-MVNO is ageneralization of the existing business model of MVNO [8],which refers to the network operator who does not own alicensed frequency spectrum or even wireless infrastructure,but resells wireless services under its own brand name. TheMVNO business model has been very successful after morethan 10 years’ development, and there are more than 600MVNOs today [9], [10]. The C-MVNO model generalizes theMVNO model with DSA technologies, which allow the virtualoperator to obtain spectrum resources through both spectrumsensing and leasing. The C-MVNO model can be applied toa wild range of wireless scenarios. One example is the IEEE802.22 standard [11], which suggests that the cognitive radio

http://arxiv.org/abs/1212.3979v1

2

network using white space in TV spectrum will operate ona point to multipoint basis (i.e., a base station to customer-premises equipments). Such a secondary base station can beoperated by a C-MVNO.

The key difference between our work and the ones in[6], [7] is that we study a much more realistic dynamicnetwork in this paper. In [6], [7], the authors formulated theproblem based on a static network scenario, and providedinteresting equilibrium results through a one-shot Stackelberggame. However, the real network is highly dynamic. Forexample, users arrive and leave the systems randomly, thestatistics of spectrum availability changes over time, andthespectrum-sensing results are imperfect. Also the leasing priceis often unpredictable and changing from time to time. Thesedynamics and realistic concerns make the network model andthe corresponding analysis rather challenging.

In this paper, we focus on the profit maximization problemfor C-MVNO in a dynamic network scenario. Our key resultsand contributions are summarized as follows.

• A dynamic network decision model: Our model incor-porates various key dynamic aspects of a cognitive radionetwork and the dynamic decision process of a C-MVNO.We model sensing channel availability, leasing marketprice, and channel conditions as exogenous stochasticprocesses.

• Dynamic user demands: We allow users to dynamicallyjoin the network with random demands (file sizes). Thedemand is affected by both the transmission prices (deci-sion variables) and market states (exogenous stochastics).

• Realistic cognitive radio model: We incorporate variouspractical issues such as imperfect spectrum sensing, pri-mary users’ collision tolerance, and sensing technologyselection. The operator needs to choose a sensing tech-nology to trade-off between cost and performance.

• A low-complexity on-line control policy: By exploitingthe special structure of the problem, we design a low-complexity on-line pricing and resource allocation policy,which can achieve arbitrarily close to the operator’soptimal profit. The policy does not require precise in-formation of the dynamic network parameters, has a lowsystem overhead, and is easy to implement.

The remainder of the paper is organized as follows. InSection II, we introduce the related work. In Section 3, weintroduce the system model. Section 4 describes the problemformulation. In Section 5, we propose the profit maximizationcontrol (PMC) policy for homogeneous users and analyze itsperformance. We further extend profit maximization controlpolicy (M-PMC policy) to heterogeneous users in Section 6.Section 7 provides simulation results for both PMC and M-PMC polices. Finally, we conclude the paper in Section 8.

2 RELATED WORK

Among the vast literature on cognitive radio, we will focuson the results onoperator-oriented cognitive radio networks,where secondary operators play key roles in terms of coordi-nating the transmissions of the secondary users. These studiesonly started to emerge recently, e.g., [6], [7], [12]–[21].We

can further classify these studies into two clusters: monopolymodels with one operator, and oligopoly models with multipleoperators.

References [6], [7], [12], [13] studied monopoly modelsusing the Stackelberg game formulation. Daoud et al. in [12]proposed a profit-maximizing pricing strategy for uplink powercontrol problem in wide-band cognitive radio networks. Yu etal. in [13] proposed a pricing scheme that can guarantee afair and efficient power allocation among the secondary users.

References [14]–[21] looked at the oligopoly issues, eitherbetween two operators [14], [15] or among many operators[16]–[21]. For the case of two operators, Jia and Zhang in[14] proposed a non-cooperative two-stage game model tostudy the duopoly competition. Duan et al. in [15] formulatedthe economic interaction among the spectrum owner, twosecondary operators and the users as a three-stage game. Forthe case of many operators, Ileri et al. in [16] developed anon-cooperative game to model competition of operators in amixed commons/property-rights regime under the regulation ofa spectrum policy server. Elias and Martignon in [17] showedthat polynomial pricing functions lead to unique and efficientNash equilibrium for the two-stage Stackelberg game betweennetwork operators and secondary users. Niyato et al. in [18]formulated an evolutionary game for modeling the dynamics ofa multiple-seller, multiple-buyer spectrum trading market. Inaddition, several auction mechanisms were proposed to studythe investment problems of cognitive network operators (e.g.,[19]–[21]).

All results mentioned above considered a rather staticnetwork model. In contrast, our work adopts a dynamicnetwork model to characterize the stochastic nature of wirelessnetworks. We will focus on a monopoly model in this paper.

In this paper, we use Lyapunov stochastic optimizationto show the optimality and stability of the proposed profitmaximizing control algorithms. Several closely related pre-vious results applying Lyaunov stochastic optimization towireless networks include [22]–[24]. Huang and Neely in [22]considered revenue maximization problem for a conventionalwireless access point without considering the cognitive radiotechnologies. Urgaonkar and Neely in [23] and Lotfinezhadet al. in [24] studied cognitive radio networks based ona user-oriented approach, by designing joint scheduling andresource allocation algorithms to maximize the utility of agroup of secondary users. Our paper focused on an operator-oriented approach to address profit maximization problem. Inparticular, we need to deal with the combinatorial problem ofchannel selection and channel assignment that usually leads toa high computational complexity. By discovering and utilizingthe special problem structure, we design a low-complexityalgorithm that is suitable for online implementation.

3 SYSTEM MODELConsider a C-MVNO that provides wireless communicationsservices to its own secondary users by acquiring spectrumresource from some spectrum owner. For example, Googlemay acquire spectrum from AT&T to provide its own wirelessservices through the C-MVNO model. The spectrum owner’sspectrum can be divided into two types: thesensing band and

3

the leasing band. In the sensing band, AT&T serves its ownprimary users, but allows Google to identify available spec-trum in this band through spectrum sensing without explicitcommunications with AT&T. In the leasing band, AT&T willdoes not allow spectrum sensing, and will lease the band toGoogle for economic returns.

More specifically, we consider a time-slotted OFDM sys-tem, where the C-MVNO serves the downlink transmissionsfrom its base station to the secondary users. The system modelis illustrated in Fig. 1. Secondary users randomly arrive atthe secondary network and request files with random sizesto be downloaded from the base station. This requested filesare queued at the server in the base station until they aresuccessfully transmitted to the requesting users.

� �

��

�

��

� � � � �

��

��

��

��

��

��

� �

��

� � � � ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

!�� "��

#$� ��%

��

��

&��

��

( )l tB( )s tB

max

sB max

lB

Fig. 1. Business model of the operator (Cognitive VirtualNetwork Operator).

The rest of the section introduces each part of the systemmodel in more details. The C-MVNO (or “operator” forsimplicity) obtains wireless channels through spectrum sensing(Sections 3.1 and 3.2) and spectrum leasing (Section 3.3), andallocates power over the obtained channels (Section 3.4). Sec-ondary users dynamically arrive and request file downloadingservices (based on the demand model in Section 3.5), and wemodel the requests as a queue (Section 3.6).

3.1 Imperfect Spectrum Sensing

Sensing bandBsmax

∆= {1, . . . , Bs

max} includes all channelsthat the spectrum owner allows sensing by the operator.1 Wedefinethe state of a channeli ∈ Bs

max(t) in time slot t asSi(t), which equals 0 if channeli is busy (being used by aprimary user), and equals 1 if channeli is idle.

We assume thatSi(t) is an i.i.d. Bernoulli random variable,with an idle probabilityp0 ∈ (0, 1) and a busy probability1− p0. This approximates the reality well if the time slots forsecondary transmissions are sufficiently long or the primarytransmissions are highly bursty [27]. (We will further study

1. The operator will collect the sensing information from a sensor networkor geolocation database and provide it to its users,i.e., providing “sensingas service” [25], [26]. This means that the network can accommodatelegacy mobile devices without cognitive radio capabilities. For more detaileddiscussions, see [7].

the general Markovian model in Section 5.5.) We define thesensing state of a channeli ∈ Bs

max in time slot t asWi(t),which equals to 0 if channeli is sensed busy, and 1 if sensedidle.

Notice thatWi(t) may not equal toSi(t) due to imperfectsensing. The accuracy of spectrum sensing depends on thesensing technology [28]. If we denoteCs as the sensing cost(per channel)2, then we can write the false alarm probability asPfa(C

s)∆= Pr{Wi = 0|Si = 1} (same for all channeli) and

the missed detection probability asPmd(Cs)

∆= Pr{Wi =

1|Si = 0} (same for all channeli). Both functions aredecreasing inCs. Intuitively, a better technology will havea higher costCs, a lower false alarm probabilityPfa(C

s),and a lower missed detection probabilityPmd(C

s). We denoteall choices of costCs (and thus the corresponding sensingtechnologies) by a finite setCs.

As different channels have different conditions (to be ex-plained in details in Section 3.4), the operator needs to decidewhich channels to sense at the beginning of each time slot.We useBs(t) to denotethe set of channels sensed by theoperator at time t, which satisfies

Bs(t) ⊆ Bsmax, ∀ t. (1)

3.2 Collision ConstraintMissed detections in spectrum sensing lead to transmissioncollisions with the primary users. We denote thecollision inchannel i ∈ Bs

max at time t as a binary random variableXi(t) ∈ {0, 1}. We haveXi(t)

∆= (1 − Si(t))Wi(t), i.e., the

collision happens if and only if the channel is busy but issensed idle.

To protect primary users’ transmissions, the operator needsto ensure that the average collision in each channeli does notexceed a tolerable levelηi (measured in terms of the averagenumber of collisions per unit time) specified by the spectrumowner. The tolerable levelηi can be channel specific, since theprimary users in different channels may have different QoSrequirements. We define the time-average number of collisionin channeli asXi

∆= limt→∞

1t

∑t−1τ=0 E [Xi(τ)]. The collision

constraints are

Xi ≤ ηi, ∀i ∈ Bsmax(t). (2)

3.3 Spectrum Leasing with a Dynamic Market PriceA spectrum owner may have some channels that do not want tobe sensed, for either privacy reasons or the fear of collisionsdue to sensing errors. However, these channels may not bealways fully utilized. The spectrum owner can lease the unusedpart of these channels to the operator dynamically over timeto earn more revenue. Recall that we denote the set of thesechannels as the leasing bandBl

max∆= {1, . . . , Bl

max}. (Ingeneral, we may represent it asBl

max(t), since our modelallows leasing band to be time-varying. For the simplicity ofnotations, we denote it asBl

max whenever it is clear.) We useBli(t) to denotethe set of channels leased by the operator

at time t, which satisfies

Bl(t) ⊆ Blmax, ∀ t. (3)

2. The cost corresponds to, for example, power or time used for sensing.

4

These channels will be exclusively used by the operator in thecurrent time slot. We denote the leasing price per channel asCl(t), which stochastically changes according to the supplyand demand relationship in the spectrum market (which mightinvolve many spectrum owners and operators). It can bemodeled by an exogenous (not affected by this particularoperator’s decisions) random process with countable discretestates and stationary distribution (not necessarily knownbythe operator).

3.4 Power AllocationIn wireless network, there are usually channel fading dueto multipath propagation or shadowing from obstacles. Tocombat channel fading, it is necessary for the operator to doproper power allocation in both sensing channels and leasingchannels to achieve satisfactory data rates. For each channeli ∈ Bmax

∆= Bs

max ∪ Blmax, hi(t) represents its channel gain

in time slot t and follows an i.i.d. distribution over time.Different channels have independent and possibly differentchannel gain distributions. We assume that secondary usersare homogeneous and experience the same channel conditionfor the same channel. But channel conditions can be differentin different channels.3 (The heterogenous user scenario willbe further discussed in Section 6.) The operator can measurehi(t) for eachi at the beginning of each slott, but may notknow the distributions. LetPi(t) denote thepower allocatedto channel i at time t. Since we consider a downlink casehere, the operator needs to satisfy thetotal power constraintPmax at its base station,∑

i∈Bmax

Pi(t) ≤ Pmax, ∀t. (4)

In addition, for a channeli ∈ Bsmax in the sensing band,

we use the binary variableIi(t)∆= Si(t)Wi(t) to denote the

transmission result of a secondary user,i.e., Ii(t) = 1 ifsuccessful (i.e., Si(t) = 1 and Wi(t) = 1) and Ii(t) = 0otherwise (either not sensed, or sensed busy, or sensed idlebut actually busy). Based on the discussion of the leasingagreement, we haveIi(t) ≡ 1, i ∈ Bl

max for all channel inleasing band. Then the rate in channeli at time slott is (basedon the Shannon formula)

ri(t) = Ii(t) log2(1 + hi(t)Pi(t)), (5)

and total transmission rate obtained by the operator is

r(t) =∑

i∈Bs(t)∩Bl(t)

ri(t). (6)

Furthermore, we assume that the operator has a finitemaximum transmission rate,i.e., r(t) ≤ rmax, ∀t, under anyfeasible power allocation.

3.5 Demand ModelWe will focus on elastic data traffic in this paper. Secondaryusers randomly arrive at the network to request files withrandom and finite file sizes (measured in the number of

3. This is the case where the users are located close by, and thus thedownlink channel condition from the base station to the users is userindependent.

packets) from the operator. A user will leave the networkonce it has downloaded the complete requested file. Theoperator can price the packet transmission dynamically overtime, which will affect the users’ arrival rate. For example, ahigher price at peak time can refrain users from downloadingfiles, as they can wait until a later time with a lower price.To model this, we useM(t) to denote the randommarketstate, which can be measured precisely at the beginning ofeach time slott and can help estimate the users demand4. Therandom variable is drawn from a finite setM over time in ani.i.d. fashion. The distribution ofM(t) may not be known bythe operator.

At a time t, the operator will decide whether to accept newfile downloading requests from newly arrived secondary users.We define thebinary demand control variable as O(t),whereO(t) = 1 means that the operator accepts the incomingrequests in timet, andO(t) = 0 otherwise. When the operatordecides to accept new requests of packet transmissions, it willalso announcea price q(t) for transmitting one packet(to any user). This price will affect the users’ incentives ofdownloading requests,e.g., when priceq(t) is high, some usersmay choose to postpone their requests.

More precisely, we denote the number of incoming users attime t as a discrete random variableN(t)

∆= N (M(t), q(t)) ∈

{0, 1, 2 . . .}, the distribution of which is a function of thetransmission priceq(t) and market stateM(t). Further, auser n’s requested file size is denotedLn(t), with n ∈{1, 2, . . . , N(q(t))}, which is assumed to be independent ofeach other and does not depend onq(t) or M(t). Moreover,we assume that users are using a setK = {1, 2, . . . ,K} ofdifferent applications, and denoteθk as the probability that anincoming user is using applicationk ∈ K with

∑K

k=1 θk = 1.The distributions of the file length for different applicationscan be different, and we denotelk as the expected file lengthof applicationk ∈ K.

To summarize,users’ instantaneous demand at timet is

A(t)∆=

N(M(t),q(t))∑

n=1

Ln(t), (7)

which is a random variable due to random file sizes andthe random number of incoming users (even givenq(t) andM(t)). We define theusers’ (expected) demand functionas D(t)

∆= D (M(t), q(t))

∆= E [A (M(t), q(t))], and its

value is completely determined byM(t) and q(t). We cancalculate thatD(M(t), q(t)) = E [N(M(t), q(t))]

∑k∈K θklk.

Then it is reasonable to assume that the operator can ratheraccurately characterize the expected number of incoming usersE [N(M(t), q(t))] through long-term observations. Thus thedemand functionD(M(t), q(t)) is known by the operator.We further assume that the instantaneous demand is upper-bounded asA(t) ≤ Amax for all t, and that the demandfunctionD(t) is non-negative and non-increasing function ofthe price q(t). When the price is higher than some upper-bound,i.e., q(t) ≥ qmax, the demand functionD(t) will bezero. The optimization ofO(t) andq(t) based on the demandfunction will be further discussed in Section 5.2.1.

4. For example,M(t) can be users’ willingness to pays, or whether thesystem is in peak time or off-peak time.

5

3.6 Queuing dynamicsSince we focus on the profit maximization problem in thispaper, we will take a simple view of the network and modelusers’ dynamic arrivals and departures as a single server queue.When a user accesses the network, the corresponding filewill be queued in a server at the base station, waiting tobe transmitted to the user according to the First Come FirstServe (FCFS) discipline. Shama and Lin in [29] showed thatthe single server queue model is a good approximation foran OFDM system, especially when the number of users andchannels are large.

We denote thequeue length (i.e., the backlog, or thenumber of all packets from all queued files) at timet asQ(t).Thus the queuing dynamic can be written as

Q(t+ 1) =(Q(t)− r(t)

)++O(t)A(t), (8)

where(a)+∆= max(a, 0), r(t) andA(t) are the transmission

rate and incoming rate at timet, andO(t) is the binary demandcontrol variable (i.e., O(t) = 1 means the operator admit theusers’ transmission requests at timet). Throughout the paper,we adopt the following notion ofqueue stability:

Q∆= lim sup

t→∞

1

t

t−1∑

τ=0

E [Q(τ)] <∞. (9)

4 PROBLEM FORMULATION

For notation convenience, we introduce several condensednotations and use them together with the original notations.

We defineφ(t)∆= (M(t),h(t), Cl(t)) as observable pa-

rameters, including the market stateM(t), channel conditions(vector) h(t), and the leasing priceCl(t) in the spectrummarket. Based on previous assumptions,φ(t)’s are i.i.d overtime and take values from a finite setΦ.

We defineγ(t)∆= (O(t), q(t), Cs(t),Bs(t),Bl(t),P (t)) as

decision variables, including the demand control variableO(t), the transmission price for usersq(t), the sensing cost(with the corresponding sensing technology)Cs(t), the set ofsensing channelsBs(t), the set of leasing channelsBl(t), andpower allocations (vector)P (t) of the operator. We assumethat γ(t) takes values form a countable (finite or infinite) setΓφ(t), which is a Cartesian product of the feasible regionsof all variables,i.e., non-negative values satisfying constraints(1), (3), and (4). With the condensed notations, functions inthis paper can be simply represented as functions ofγ(t) withparameterφ(t).

We further define theinstantaneous profit in time t

R(t)∆= R(γ(t);φ(t))∆= q(t)O(t)A(t) − Cs(t)|Bs(t)| − Cl(t)|Bl(t)|. (10)

The time average profit is denoted as

R∆= lim sup

t→∞

1

t

t−1∑

τ=0

E [R(t)].

All expectations in this paper are taken with respect to systemparametersφ(t) unless stated otherwise.

We look at the profit maximization problem through pricingdetermination and resource allocations. At the beginning of

each time slott, the operator observes the value ofφ(t) andmakes a decisionγ(t) to maximize the time average profit,subject to the system stability constraint (11) and the collisionupper-bound requirement (12). The Profit Maximization (PM)problem is formulated as

PM: Maximize R

Subject to Q <∞, (11)

Xi ≤ ηi, i ∈ Bsmax, (12)

Variables γ(t) ∈ Γφ(t), ∀t,

Parametersφ(t), ∀t.

We represent its optimal solution asγ∗(t) =(O∗(t), q∗(t), Cs∗(t),Bs∗(t),Bl∗(t),P ∗(t)

), and denote

R∗

as the maximum profit. The PM problem is an infinitehorizon stochastic optimization problem, which is in generalhard to solve directly, especially when the distribution ofdynamic parameterφ(t) is unknown. For example, the futureleasing price is hard to predict due to the dynamic suppliesand demands in the market; and the primary users’ activitiescan not be estimated precisely before hand.

5 PROFIT MAXIMIZATION CONTROL POLICYNow, we adopt Lyapunov stochastic optimization technique tosolve the PM problem.

5.1 Lyapunov stochastic optimizationWe first introduce a virtual queue for constraint (12), and thenderive the optimal control policy to solve the PM problemthrough the technique of drift-plus-penalty function minimiza-tion [30].

We denoteZi(t) as the number of collisions happeningin sensing channeli ∈ Bs

max. The counterZi(t) can beunderstood as a “virtual queue”, in which the incoming rate isXi(t), and the serving rate isηi (the collision tolerant level).The queue dynamic is

Zi(t+ 1) =(Zi(t)− ηi

)++Xi(t), (13)

with Zi(0) = 0. By this notion, if the virtual queue is stable,then it implies that the average incoming rate is no larger thanthe average serving rate. This is just the same as the collisionupper-bound constraint (12).

We introduce the general queue length vectorΘ(t)∆=

{Q(t),Z(t)}. We then define the Lyapunov function

L(Θ(t))∆=

1

2

[Q(t)2 +

∑

i∈Bsmax

Zi(t)2],

and the Lyapunov drift

∆(Θ(t))∆= E

[L(Θ(t+ 1))− L(Θ(t))|Θ(t)

]. (14)

According to the Lyapunov stochastic optimization tech-nique, we can obtain instantaneous control policy that cansolve the PM problem though minimizing some upper boundof the following drift-plus-penalty function in every slott:

∆(Θ(t))− V E [R(t)|Θ(t)]. (15)

There are two terms in the above function. The first term isthe Lyapunov drift defined in (14). It is shown by Lyapunov

6

stochastic optimization [30] that we can achieve the systemstabilities (i.e., constraints (11) and (12) of the PM problem)by showing the existence of a constant upper bound for thedrift function. The second term in (15) is just the objectiveofthe PM problem,i.e., to minimize the minus profit, whichis equivalent to maximize the profit. Here parameterV isintroduced to achieve the desired tradeoff between profit andqueuing delay in the control policy. We first find an upperbound for (15).

By the queue dynamic (8), we have

Q(t+1)2 ≤ (Q(t)−r(t))2+A(t)2+2Q(t)O(t)A(t)

= Q(t)2+r(t)2 +A(t)2+2Q(t)(O(t)A(t)−r(t)) . (16)

Similarly, for virtual queue (13), we have

Zi(t+1)2 ≤ Zi(t)2+η2i +Xi(t)

2+2Zi(t)(Xi(t)−ηi). (17)

Substituting (16) and (17) into (15), we have

∆(Θ(t))− V E [R(t)|Θ(t)] ≤ D −D1(t)

− V E[q(t)O(t)A(t)−Cs(t)|Bs(t)|−Cl(t)|Bl(t)||Θ(t)

]

+Q(t)E [O(t)A(t) − r(t)|Θ(t)] +∑

i∈Bsmax

Zi(t)E [Xi(t)|Θ(t)]

whereD is a positive constant satisfying the following condi-tion for all t,

D≥1

2

E[r(t)2+(O(t)A(t))2|Θ(t)

]+∑

i∈Bsmax

E

[Xi(t)

2+η2i |Θ(t)],

andD1(t)∆=∑

i∈Bsmax

Zi(t)ηi is a known constant at timet,since the values ofZi(t)s are known at timet.

To further simplify the above expression, we introduce twonew notations: channel selectionB(t)

∆= Bs(t) ∪ Bl(t), and

channel cost

Ci(t)∆=

{Cs(t), if i ∈ Bs(t),

Cl(t), if i ∈ Bl(t),(18)

where Cs(t) is the virtual sensing cost and is defined asCs(t)

∆= Cs(t) + (1/V )Zi(t)E [Xi(t)|Θ(t)]. Note that this

virtual sensing cost depends not only on the sensing cost butalso on the collision history in this channel. More frequentpastcollisions in this channel will increase the virtual sensing cost,hence makes the operator more conservative about choosingthis sensing channel. It then follows:

∆(Θ(t))− V E [R(t)|Θ(t)] ≤ D −D1(t)

+ V E

[(Q(t)

V− q(t)

)O(t)A(t)

∣∣∣∣Θ(t)

]

+ V E

∑

i∈B(t)

Ci(t)−Q(t)ri(t)

V

∣∣∣∣∣∣Θ(t)

, (19)

where we use the fact that collisions between secondary andprimary users can only happen in channels that are chosenfor sensing, i.e., i ∈ Bs(t). Next we propose the ProfitMaximization Control (PMC) policy to minimize the righthand side of inequality (19) for each timet.

5.2 Profit Maximization Control (PMC) policyIt is clear that minimizing the right hand side of (19) isequivalent to minimizing the last two terms in (19). Note thatthe last two terms are decouple in decision variables, thus wehave the two parallel parts in the PMC policy as follows:

5.2.1 Revenue MaximizationHere we determine two variables: the transmission priceq(t)and the market control decisionO(t). The optimal transmis-sion priceq(t) is obtained by solving the following revenuemaximization problem:

Maximizeq(t)D (q(t),M(t))−Q(t)

VD (q(t),M(t)) (20)

Variablesq(t) ≥ 0

To obtain the above problem formulation of revenuemaximization, we use the fact that the demand functionD(M(t), q(t))

∆= E [A(t)], which is independent of the queu-

ing states of the system.Note that the first term in (20) is just the revenue that

the operator collects from its users. The second term can beviewed as a shift of the queuing effect, which is introducedby the Lyapunov drift for system stability.

If the maximum objective in (20) (under the optimal choiceof q(t)) is positive, the operator sets the demand controlvariableO(t) = 1 and accepts the present incoming requestsA(t) at the priceq(t). Otherwise, the operator setsO(t) = 0and rejects any new requests.

5.2.2 Cost MinimizationWe determine channels selectionB(t), sensing technology(or cost)Cs(t), and power allocationP (t), by solving thefollowing optimization problem to control the costs of theoperator to provide transmission services to its users.

Minimize∑

i∈B(t)

Ci(t)−Q(t)

VE [ri(t)|Θ] (21)

Subject to (1), (3), (4)

Variables Cs(t),Bs(t),Bl(t), Pi(t) ≥ 0

To obtain the above problem formulation of cost minimiza-tion, we use the fact thatXi(t) = (1 − Si(t))Wi(t), whichis independent of the queuing state. Thus the virtual sensingcost can be updated asCs(t) = Cs(t) + (1/V )Zi(t)(1 −p0)Pmd(C

s(t)), which increases with the virtual queue andmissed detection probability.

Note that the first term in the summation in (21) is thecost of each channel. The second term in the summation is aqueuing-weighted expected transmission rate, again is a shiftintroduced by Lyapunov drift for system stability. This shiftcan be also viewed as the “gain” collected from the channelto help clear the queue.

5.2.3 Intuitions behind the PMC policyWe discuss some intuitions behind the PMC policy. Tomaximize the profit, the operator needs to perform revenuemaximization and cost minimization. To guarantee the queuingstability, some shifts (i.e., all queue-related terms) are intro-duced by the Lyapunov drift in these problems. In Appendix

7

Section D, we show that the queueing effect will increase theoptimal price announced by the operator (comparing with notconsidering queueing), as a higher price will reduce the users’demands and maintain the system stability.

The Lyapunov stochastic optimization approach providesa way to decompose a long-term average goal (e.g., thePM problem) into instantaneous optimization problems (e.g.,revenue maximization and cost minimization problems in thePMC policy). In the stochastic optimization problem, thecurrent decisions always have impacts on the future prob-lems. These impacts are characterized and incorporated bythe queueing shift terms in the instantaneous optimizationproblems. Therefore, we can achieve the long term goalthrough focusing on the instantaneous decisions in every timeslot. The flowchart for the PMC policy is illustrated in Fig. 2.

Revenge Maximization:

Determine price ��

and market control strategy �� by

solving (21)

Updating:

� ← � � 1

Update queues � and �� by (8) and (13)

Initiation: � � 0 and � � � 0

Power allocation:Compute �

� � by Algorithm 3

sensing results ��

Sensing cost selection & channel selection:

Compute �� , �� , �� by Algorithm 1

��

��

Cost Minimization:

Profit Maximization Control (PMC) Policy:

Fig. 2. Flowchart of the dynamic PMC policy

Although the revenue maximization problem is relativelyeasy to solve, the cost minimization problem is very com-plicated. It is actually a two-stage decision problem. In thefirst stage, the operator determines the sensing technology,and chooses which channels to sense and which channels tolease. Then spectrum sensing is performed to identify availablechannels. With this information, the operator further allocatesdownlink transmission power in the available channels (sensedidle ones and leasing ones). In Section 5.3, we focus ondesigning algorithms to solve the cost minimization problem.

5.3 Algorithms for Cost Minimization ProblemNow we use backward induction to solve the cost mimmationproblem.

5.3.1 The Second Stage ProblemWe first analyze the power allocation in the second stage,where the sensing resultsWi(t), the channel selectionBs(t),and the sensing technologyCs(t) have been determined.Therefore, the power allocation problem of (21) is as follows:

Maximize∑

i∈B(t)

ωi(t) log (1 + hi(t)Pi(t)) (22)

Subject to∑

i∈Bmax

Pi(t) ≤ Pmax,

Variables Pi(t) ≥ 0

where

ωi(t)∆=

{ω0

∆= E [Si(t)|Wi(t) = 1], if i ∈ Bs(t),

1, if i ∈ Bl(t).(23)

and we can calculate

E [Si(t)|Wi(t) = 1]

=p0(1− Pfa(C

s(t)))

p0(1− Pfa(Cs(t))) + (1− p0)Pmd(Cs(t)). (24)

By using the Lagrange duality theory, we can show that theproblem (22) has the following optimal solution

P ∗i (t) =

ωi(t)

(1

λ(t) −1

ωi(t)hi(t)

)+i ∈ B(t),

0 i /∈ B(t),(25)

where λ(t) is the Lagrange multiplier of the total powerconstraint (4). The optimal value ofλ(t) is the following waterfilling solution,

λ(t) =

∑i∈Bp(t)

ωi(t)

Pmax +∑

i∈Bp(t)1

hi(t)

, (26)

whereBp(t)∆= {i ∈ B(t) : Pi(t) > 0}. Note that (26) is a

fixed-point equation ofλ(t), and the precise value ofλ(t) isnot given here.

When the values of all parameters (i.e., hi(t), ωi(t)) aregiven, we can use a simple water level searching Algorithm 1and similar as the searching algorithms in [32], [33]) todetermine the exact optimal value ofλ(t). In the followingpseudo code of Algorithm 1, we define a functionΛ(m) asfollows:

Λ(m)∆=

∑m

i=1 ωi

Pmax +∑m

i=11hi

.

Algorithm 1 Power Allocation1: Rearrange the channel indicesi ∈ Bmax as a decreasing

order ofωi(t)hi(t)2: m← |B(t)|, λ⇐ Λ(m)3: while λ ≥ hm(t)ωm(t) do4: m← m− 15: λ⇐ Λ(m)6: end while

The main complexity in this algorithm is to sort the channelsaccording to the channel gains. We can adopt established sort-ing algorithms [34] to obtain the index rearrangement with acomplexityO(|Bmax| log(|Bmax|)). Thus the total complexityof Algorithm 1 isO(|Bmax|3 log(|Bmax|)).

5.3.2 The First Stage ProblemLet us consider the first stage problem to determine the sensingtechnology, the sensing set, and the leasing set. Note that sincethe sensing has not been performed at this stage yet, thus thesensing resultWi(t) is not known. We denote

αi(t) =

{α0

∆= E [Si(t)Wi(t)] if i ∈ Bs(t)

1 if i ∈ Bl(t),(27)

and we can calculate

E [Si(t)Wi(t)] = p0(1 − Pfa(Cs(t))) (28)

8

Substitute the optimal power allocation (25) into the prob-lem (21), we have

Minimize∑

i∈B(t)

Ci(t)−Q(t)

Vαi(t)

(log

(hi(t)ωi(t)

λ(t)

))+

(29)

Subject to Bs(t) ⊆ Bsmax, B

l(t) ⊆ Blmax, C

s(t) ∈ C

Variables Bs(t),Bl(t), Cs(t)

We first consider the above problem for a fixed sensing costCs(t). This problem is a combinatorial optimization problemof Bs(t) andBs(t). The worst case of searching complexity(i.e., exhaustive searching) can beO(2|Bmax|), exponential inthe number of total channels.

However, we can reduce the complexity by exploring thespecial structure of this problem.

Proposition 1: (Threshold Property)• We rearrange the leasing channel indicesi ∈ Bl

max in thedecreasing order ofgi(t), which is defined as

gi(t)∆= hi(t) exp

(−Ci(t)Q(t)V

). (30)

There exists a threshold indexilth, such that a channeli ischosen for leasing (i.e., i ∈ Bl(t)) if and only if i ≤ ilth.

• We rearrange the sensing channel indicesj ∈ Bsmax in

the decreasing order ofgj(t), which is defined as

gj(t)∆= ω0hj(t) exp

(−

Cj(t)Q(t)V

α0

). (31)

For all leasing channelsj ∈ Bsmax, there exists a threshold

indexjsth, such that a channelj is chosen for sensing (i.e.,j ∈ Bs(t)) if and only if j ≤ jsth.Proof: For each channel in the optimal channel selection

set i ∈ B∗(t), it satisfies the following condition

Ci(t)≤Q(t)

Vαi(t)

(log

(ωi(t)hi(t)

λ(t)

))+

. (32)

This result is easy to see from the objective function in (29):to optimize the profit, we should only pick the channel withits cost no larger than its gain. Thus by (30) and (31), theoptimization problem in (29) can be written in the followingequivalent form:

Maximize∑

i∈Bl(t)

(log

(gi(t)

λ(t)

))+

+α0

∑

j∈Bs(t)

(log

(gj(t)

λ(t)

))+

(33)

Subject to Bs(t) ⊆ Bsmax, Bl(t) ⊆ Bl

max

Variables Cs(t),Bs(t),Bl(t)

Thus by thelog function in the objective of (33), the thresholdproperty immediately follows.

This proposition suggests that we should select the channelwith a large gi (for leasing channels) orgj (for sensingchannels). Note that as defined in (30) and (31),gi and gjare equal to channel information (i.e., hi for leasing channels,ωjhj for sensing channels) multiplying a decaying factorrelated to the channel cost. They can be understood asvirtualchannel gains by taking channel costs into consideration. Alarge value ofgi or gj means that the channel is cost-effective,i.e., the channel has a good channel gain as well as a low cost.

By Proposition 1, it is clear that we can obtain the optimalchannel selection by an exhaustive search of the optimalsensing and leasing thresholds. Algorithm 2 gives a pseudocode for the searching procedure.

Algorithm 2 Optimal Channel Selection (for a givenCs(t))

1: procedure COMPUTING B(Cs(t))2: invoke procedure SearchingThreshold(Cs) to calculate

Υl andΥs

3: U(Cs(t))← 04: for i = 0, 1 . . . ,Υl do5: for j = 0, 1 . . . ,Υs do6: Calculateλ(t) as (26) withBp(t) = Bl

i ∪ Bsj

7: if gi(t) > λ(t) andgj(t) > λ(t) then8: CalculateU(i, j)9: if U(i, j) < U(Cs(t)) then

10: U(Cs(t))← U(i, j)11: B(Cs(t))← Bl

i ∪ Bsj

12: end if13: end if14: end for15: end for16: end procedure

In Algorithm 2, U(i, j) denotes the optimal value of (33)with the channel selection setB = Bl

i ∪ Bsj . To decrease the

number of searching loops, we can first run Algorithm 5 (inAppendix C) to determine the maximum possible thresholdsΥl for leasing channels orΥs for sensing channels. (If we donot run Algorithm 5 , we can just setΥl = |Bl

max| andΥs =|Bs

max|. Whether we run Algorithm 5 or not, the complexityof Algorithm 2 is no worse thanO(|Bs

max| × |Blmax|).) Thus

the searching complexity is reduced toO(|Bmax|2), given thechannel indices are rearranged as in the Proposition 1. We canadopt established sorting algorithms [34] to obtain the indexrearrangement with a complexityO(|Bmax| log(|Bmax|)). Thusthe total complexity of finding the optimal channel selectionis O(|Bmax|3 log(|Bmax|)).

Note that in real systems, the channel conditions and theleasing cost may not change as frequently as every time slot.We usually can update these network parameters every timeframe (which is composed by several time slots instead ofone time slot). Accordingly, the above algorithm will also beoperated based on the time frames, which will greatly reducethe computation complexity in practice.

Furthermore, let us find the optimal sensing costCs(t) byenumerating all possible sensing costsCs(t) ∈ Cs. For thesensing costCs(t), we denote the objective value in (33) asU(Cs(t)) and the optimal channel selection set asB(Cs(t)).The corresponding pseudo code is given in Algorithm 3, thecomplexity of which isO(|C| × |Bmax|3 log(|Bmax|)).

So far, we have completely solved the two-stage optimiza-tion problem in (21). For each timet, the operator first runs Al-gorithm 3 to choose the channel setsB∗(t) = Bs∗(t)∪Bl∗(t).Then it uses the sensing technology with a costCs∗(t) tosense channels inBs∗(t). Based on the sensing results, itfurther runs Algorithm 1 to determine the power allocationP ∗i (t), i ∈ B

∗(t).

9

Algorithm 3 Optimal Sensing Cost and Channel Selection1: U∗ ← 02: for Cs(t) ∈ Cs do3: Determine the optimal channel selectionB(Cs(t)) (see

Algorithm 2)4: CalculateU(Cs(t))5: if U∗ > U then6: U∗ ← U , Cs∗(t)← Cs(t), B∗(t)← B(Cs(t))7: end if8: end for

5.3.3 Sensing vs. LeasingWe are also interested in how the PMC policy makes the besttradeoff between sensing and leasing based on the sensing costCs(t) and the leasing costCl(t). To make the comparison easyto understand, we will consider perfect sensing with no sensingerrors (i.e., ω0 = 1 and α0 = p0). We will further assumethat a leasing channeli and a sensing channelj have thesame channel gainhi(t) = hj(t). Finally, we assume that twochannels have the same availability-price-ratio,i.e., the costssatisfy Cs(t) = α0C

l(t). We want to answer the followingquestion:is the PMC policy indifferent in choosing either ofthe two channel?

By (30) and (31), we havegi = gj for these two channels.By (33), we can calculate the net gains by channeli and

j:(log(

gi(t)λ(t)

))+≥ α0

(log(

gj(t)λ(t)

))+. To maximize the

objective in (33), it is clear that PMC policy will prefer theleasing channeli over the sensing channelj, and this tendencyincreases asα0 decreases. If we view the channel unavailabil-ity (1−α0) as the risk of choosing the sensing channel, thenthe PMC policy is a risk averse one. This is mainly due tothe concavity of the rate function. This preference order willalso hold in the imperfect sensing case, in which case we will

havegi > gj and(log(

gi(t)λ(t)

))+≥ α0

(log(

gj(t)λ(t)

))+.

5.4 Performance of the PMC Policy

We can characterize the performance of the PMC Policy asfollows:

Theorem 1: For any positive valueV , the PMC Policy hasthe following properties:(a) The queue stability (11) and collision constraints (12)are

satisfied. The queue length is upper bounded by

Q(t) ≤ Qmax∆= V qmax+Amax, ∀ t; (34)

and the virtual queue length is upper-bounded by

Zi(t) ≤ Zmax∆= κ(V qmax +Amax) + 1, ∀ i, t. (35)

where

κ∆=

rmaxp0(1− Pfa(Cs0)))

(1− p0)Pmd(Cs0)

andCs0 ∆= max

Cs

p0(1−Pfa(Cs)))

(1−p0)Pmd(Cs) .

(b) The average profitRPMC obtained by the PMC policysatisfies

inf RPMC ≥ R∗−O(1/V ), (36)

whereR∗

is the optimal value of the PM problem.According to the Little’s law, the average queuing delay

is proportional to the queue length. Thus users experiencebounded queuing delays under the PMC algorithm by (34).By (36), we find that the profit obtained by the PMC Policycan be made closer to the optimal profit by increasingV .However, asV increases, the queuing delay also increases asshown in (34). The best choice ofV depends on the desiredtrade-off between queuing delay and profit optimality.

A detailed proof of Theorem 1 is provided in Appendices Aand B.

5.5 Extension: More General Model of Primary Ac-tivitiesIn the previous analysis, we have assumed that primaryusers’ activities in each sensing channel follow a simple i.i.d.Bernoulli random process. Next we will show that the PMCpolicy can be easily adapted to the more general Markovchain model of the primary users’ activities shown in Fig. 3.In this model, for any timet, Si(t) is unknown, but thehistory informationSi(t−1) is known, and also the transition

probabilitiesPr(Si(t) = s′|Si(t − 1) = s)∆= pis→s′ , s ∈

{0, 1}, s′ ∈ {0, 1}, i ∈ Bsmax are known from long-time

statistics.

� �

p1� 0

p0� 1

p0� 0p1� 1

Fig. 3. Markov chain model of the PUs’ activities

All previous analysis for PMC policy will still hold if weupdate two parametersωi(t) andαi(t) as follows:

ωi(t)∆=

{E [Si(t)|Wi(t) = 1, Si(t− 1)], if i ∈ Bs(t)

1, if i ∈ Bl(t)

where

E [Si(t)|Wi(t) = 1, Si(t− 1) = s]

=pis→1(1− Pfa(C

s(t)))

pis→1(1− Pfa(Cs(t))) + ps→0Pmd(Cs(t));

and

αi(t) =

{E [Si(t)Wi(t)|Si(t− 1)] if i ∈ Bs(t)

1 if i ∈ Bl(t)

where

E [Si(t)Wi(t)|Si(t− 1) = s] = pis→1(1− Pfa(Cs(t))).

There is no change in the revenue maximization part,and the cost minimization part still involves a combinatorialoptimization problem. But the complexity of solving costminimization problem becomesO(|C| × 2|B

smax

||Blmax + 1|),

since we lose the structure information in sensing channels,i.e., the threshold structure does not hold for sensing channels.In the worst case (Bmax = Bs

max, Blmax = ∅ ), it comes back

to O(|C|×2|Bmax|), which is the complexity of the exhaustivesearch without considering the threshold structure.

10

A C-MVNO

�� !"#$%&

'()*+ ,-./ 0

12345

Fig. 4. Heterogeneous user model: Users in a hexagonsare nearby homogeneous users, who have the samechannel experience. Users in deferent hexagons can havedifferent channel experience.

Let us further consider a special Markov chain model wherethe transition probability for each sensing channel is the same,i.e., Pr(Si(t) = s′|Si(t−1) = s)

∆= ps→s′ , ∀i ∈ Bs

max. In thismodel, all sensing channels can be categorized into two types,channels being busy in the last slot (i.e., Si(t − 1) = 0), orchannels being idle in the last slot (i.e., Si(t − 1) = 1). Wecan still show threshold structures for both types. Thus thecomplexity is reduced toO(|C| × |Bmax|4 log(|Bmax|)).

The above analysis shows that it is critical to exploit theproblem structure to reduce the algorithm complexity.

6 HETEROGENEOUS USERS

In Section 4, we adopt the single queue analysis for homoge-neous users who are assumed to be located nearby and havethe same channel condition on each channel. However, thesingle queue analysis no longer works for a more generalscenario of heterogeneous users, where users can be locatedat different places, and have different channel conditions. Inthis section, we introduce the multi-queue model to deal withthe heterogenous user scenario as shown in Fig. 4.

We divide the total coverage of the secondary base stationinto J

∆= {1, 2, . . . , J} disjoint small areas (illustrated as

hexagons in Fig. 4) according to users’ different channelexperiences. Users in one of these small area are nearbyhomogeneous users. They share the same channel conditions,and form a queue based on the FCFS discipline. We useQj(t)to denote the queue length in areaj. Since the queue and thecorresponding area is one-to-one mapping, we also call theusers in areaj as queuej users.

For each queuej ∈ J , hij(t) represents the users’ channelgain to channeli ∈ Bmax at time t, which follows an i.i.ddistribution over time. The indicator variableTij(t) ∈ {0, 1}denotes the operator’s channel assignment at time t:Tij(t) = 1 if channeli is allocated to queuej, andTij(t) = 0otherwise. Meanwhile, the assignmentTij must satisfy

∑

j∈J

Tij(t) ≤ 1, ∀t. (37)

The power allocation for queuej on channeli is denotedby Pij(t). The total power allocation must satisfy

∑

j∈J

∑

i∈Bmax

Pij(t) ≤ Pmax, ∀t, (38)

Thus the rate of queuej ∈ J can be calculated

rj(t) =∑

i∈B(t)

Ii(t)Tij(t) log

(1 +

hij(t)Pij(t)

Tij(t)

)(39)

whereIi(t) is the transmission result in channeli, followingthe same definition in Section 3.4.

For each queuej, we follow the same demand model as inSection 3.5 for homogenous users. We assume the number ofincoming users, the market state, and the user’s instantaneousdemand are i.i.d among different queues, and denote themasNj(t), Mj(t), andAj(t) respectively. The market controlvariable and price for queuej are denoted asOj(t) andqj(t).The queuing dynamic for queuej is as follows:

Qj(t+ 1) = (Qj(t)− rj(t))++Oj(t)Aj(t). (40)

Thus the homogeneous user model in Section 4 can also beviewed as a special case of the heterogeneous user model,whereJ is a singleton.

6.1 Multi-queue Profit Maximization Control Policy6.1.1 Revenue MaximizationFor any queuej ∈ J , we compute the optimal transmissionprice q∗j (t) by solving the following problem.

Maximize

(qj(t)−

Qj(t)

V

)D (qj(t),Mj(t)) (41)

Variables qj(t) ≥ 0

If the maximum objective in (41) is positive, the operator setstransmission control variableO∗

j (t) = 1 and accepts users’new file download requests at the priceq∗j (t). Otherwise, theoperator setsO∗

j (t) = 0 and rejects any new requests.

6.1.2 Cost MinimizationWe solve the following optimization problem to determinesensing cost and resource allocation:

Minimize∑

i∈B(t)

Ci(t)−∑

j∈J

Qj(t)

VE [rj(t)] (42)

Subject to (1), (3), (38), (37)

Variables Cs(t),Bs(t),Bl(t), Pij(t) ≥ 0, Tij(t) ∈ {0, 1}

where the costCi(t) follows the same definition of (18).Similar to the homogenous model in Section5.3, it is a two-stage decision problem. In the first stage, we determine thesensing technology, and choose the sensing channels and theleasing channels. Then in the second stage, we determine thechannel assignment and power allocation based on the sensingresults. We use backward induction to solve this problem (42).To simplify the notation, we will ignore the time index in thefollowing analysis.

We first analyze the channel assignment and power allo-cation in the second stage. In this stage, since the sensingresultsWi, the channelsBs, and the sensing technologyCs

are determined, the second stage problem of (21) is shown asfollows:

Maximize∑

j∈J

∑

i∈B

QjωiTij log

(1 +

hijPij

Tij

)(43)

Subject to (38), (37)

Variables Pij ≥ 0, Tij ∈ {0, 1}

11

whereωi follows the same definition of (23).Compared to the power allocation problem in (22), the

binary channel assignment variablesTij ’s make the secondstage problem in (43) much more complex.

We first solve problem (43) assuming fixedTij ’s, in whichcase the power allocation problem is a convex optimizationproblem. Following the same method of solving power allo-cation for homogeneous users as in (25), we have

Pij = TijQjωi

(1

λ−

1

Qjωihij

)+

, i ∈ B. (44)

whereλ is the Lagrange multiplier of the total power constraint(38), which satisfies

λ =

∑i∈Bp

ωiQjTij

Pmax +∑

i∈Bp

Tij

hi(t)

, (45)

whereBp∆= {i ∈ B : Pij > 0}. WhenTij is known, we can

design a simple search algorithm similar to Alg. 1 to determinethe optimal value ofλ.

We then substitute this result in (43), and further maximizethe objective overTij ’s. Then we have

Maximize∑

i∈B

ωi

∑

j∈J

QjTij

(log

(ωiQihi

λ

))+

(46)

Subject to∑

j∈J

Tij ≤ 1

Variables Tij ∈ {0, 1}

For each channeli ∈ B, let us denote the set

J ∗i

∆=argmax

j∈JQj

(log

(ωiQjhij

λ

))+

(47)

Here the solutionJ ∗i is a set of indices of the chosen queues.

If J ∗i is a singleton, then we denote its unique element asj∗i .

If J ∗ is not singleton, since all elements inJ ∗ lead to thesame value in objective, then we can randomly pick one ofthe its element and denote it asj∗i .

Since the objective in problem (46) is linear inTij , it iseasy to see the optimal solution is

Tij =

{1, if j = j∗i , i ∈ B

0, otherwise.(48)

Now let us consider how to calculate the value ofλ andj∗i . By (45) and (47), we find that they are actually coupledtogether. To determineλ in (45), we need to knowTij (orequivalentlyJ ∗

i , j∗i ), i.e., which queue is chosen for whichchannel. But determiningJ ∗

i in (47) requires the value ofλ. One way to solve this problem is to enumerate everypossible channel assignment combinations to find the solutionssatisfying both (45) and (47). Since each channeli ∈ Bmax canbe assigned to one ofJ queues, there are a total of|Bmax|J

channel assignment combinations. When theJ or |Bmax| islarge, the complexity can be very high. However, we canreduce the search complexity by exploring the special structureof the problem.

Property 1: For each channel allocated positive poweri ∈Bp, we have

J ∗i = argmax {Qj |hij >

λ

Qjωi

, j ∈ J }. (49)

This property comes from (47). This means that for aparticular channeli, if the channel gain for the longest queueQj is good enough (i.e., hij > λ

Qjωi), we should assign

channeli to queue with the longest queue length. IfJ ∗i is a

singleton, then we denote its unique element asj∗i . Otherwise,

we denoteJ ∗i

∆= argmax {hij | j ∈ J ∗

i }. In this case, thereare multiple channels with the same channel gain and thesame queue length, and we can randomly pick one and denoteit as j∗i . Thus we can searchλ andJ ∗

i (and alsoj∗i ) by asimple greedy algorithm as follows. First, for all channels, weassumeJ ∗

i = argmaxj∈J Qj , and calculate the value ofλby the waterfilling algorithm (as the procedure “Waterfilling”in Alg. 4). For each unchosen channel,i.e., the channeliwith ωiQj∗

ihij∗

i≤ λ, we check whetherωiQjhij > λ can be

satisfied when another queue is chosen instead ofj∗i . If thereis some setJi of queues satisfyingωiQjhij > λ, we replaceJ ∗i with the one with the longest queue length in this set,

i.e., argmaxj∈JiQj . We repeat the process iteratively until

we find theλ and j∗i that satisfies both (45) and (47). Thepseudo code is given in Alg. 4. To simplify the expression ofAlg. 4, with a little abuse of notations, we denotehi = hij∗

i,

Qi = Qj∗i, andΛ(m)

∆=

∑mi=1

ωiQi

Pmax+∑

mi=1

1

hi

.

Algorithm 4 Channel Assignment1: J ∗

i ← argmaxj∈J Qj

2: procedure WATERFILLING(hi(t), Qi(t))3: Rearrange the channel indicesi ∈ Bmax in the de-

creasing order ofωi(t)Qi(t)hi(t)4: m← |B(t)|, λ⇐ Λ(m)5: while λ ≥ hm(t)ωm(t) do6: m← m− 17: λ⇐ Λ(m)8: end while9: end procedure

10: while ωi(t)hij(t)Qj(t) > λ, ∀j, ∀i > m, do11: J ∗

i ← argmax {Qj|ωi(t)hij(t)Qj(t) > λ}12: invoke procedure Waterfilling(hi(t), Qi(t))13: end while

The complexity of Alg. 4 isO(|Bmax|3 log(|Bmax|), sincethe while loop runs no more than|Bmax| times in theworst case, and the complexity of waterfilling part isO(|Bmax|2 log(|Bmax|) (the same as the waterfilling power al-location algorithm in Section 5.3.1). Compared to the exhaus-tive search, the complexity of solving the channel assignmentis greatly reduced.

With the solution of channel assignment, we can update thepower allocation solution of (42) as

P ∗ij =

ωiQj

(1λ− 1

ωiQjhi

)+, if j = j∗, i ∈ B,

0 otherwise,

where the value ofλ, hi andQi are calculated by Alg. 4.After solving the second stage problem, we move to the first

stage. Following the channel assignment in (48), we find that

12

the first stage problem for the heterogeneous users is the samewith the one of homogeneous users problem in (29). We cansimply run the same Alg. 3 to determine sensing technologyCs∗(t), sensing channel setBs∗(t) and leasing channel setBl∗(t).

6.2 The Performance of M-PMC PolicyNext we show the performance bounds of the M-PMC Policy.The proof method is similar to that of Theorem 1, and thedetails are omitted due to space limit.

Theorem 2: For any positive valueV , the M-PMC Policyhas the following properties:(a) The queue stability (11) and collision constraints (12)are

satisfied. The queue length is upper bounded by

Qj(t) ≤ Qmax ∆= V qmax+Amax, ∀ t, (50)

and the virtual queues are bounded by

Zi(t) ≤ Zmax ∆= κ(V qmax + Amax) + 1, ∀ i, t. (51)

where κ∆= rmaxp0

p0(1−Pfa(Cs(t)))+(1−p0)Pmd(C

s(t))(1−p0)

,andCs0 denotes the highest sensing cost.

(b) The average profitRM−PMC obtained by the M-PMCpolicy satisfies

inf RM−PMC ≥ R∗−O(1/V ), (52)

whereR∗

is the optimal value of the multi-queue PMproblem.

7 SIMULATIONIn this section we provide simulation results for PMC andM-PMC policies.

We conduct simulations with the following parameters. Thenumber of incoming users in each slot satisfies a Poissondistribution with a rateD(q(t),M(t)) = 1

M(t) (q(t)−5)2. Themarket stateM(t) satisfies Bernoulli distribution,M(t) = 1with probability 0.5, andM(t) = 2 with probability 0.5. Thefile length of each user satisfies the i.i.d. (discrete) uniformdistribution between 1 and 10. There are 32 channels in total.20 of them belong to the sensing bandBs

max, and the rest12 channels belong to the leasing bandBl

max. The primarycollision probability tolerant levels are set asηi = 0.001for sensing channeli = 1, 2, . . . , 10, and ηi = 0.005 forsensing channeli = 11, 12, . . . , 20. The channel gainhi ofeach channel satisfies i.i.d. (continuous) Rayleigh distributionwith parameterσ = 4.5 The total power constraint of thebase station isPmax = 8. There are 3 different sensingtechnologies with costsCs = {0 (not sensing at all), 0.1, 0.5}.The corresponding false alarm probabilities arePfa ={0.5, 0.1, 0.008}, and the missed detection probabilities arePmd = {0.5, 0.08, 0.005}. We assume the idle time proba-bility of sensing bandp0 is 0.6, and the control parameterV ∈ {5, 10, 50, 100, 200}.

Figure 5 shows a collision situation of all sensing channelswith the control parameterV = 100. We find that primaryusers’ collision tolerant bound (12) is satisfied as time in-creases. We also find that we obtain similar curves for thecollision probabilities with other values of control parameterV .

0 1000 2000 3000 4000 50000

0.002

0.004

0.006

0.008

0.01

0.012

Time t

Col

lisio

n P

roba

blity

X

Collision situation of all sensing channels (V=100)

PU’s collision tolerant boundfor Channel 1~10

PU’s collision tolerant boundfor Channel 11~20

Fig. 5. A collision situation of all sensing channels withV = 100

Figure. 6 (a) shows that the average queue length growslinearly in V , and is always less than the worst case boundV qmax + Amax. Figure. 6 (b) shows that the average profitachieved by PMC policy converges quickly asV grows, andis close to the maximum profit whenV ≥ 100.

0 50 100 150 2000

500

1000

1500

V

Ave

rage

que

ue le

ngth

(a) Average queue length vs. Parameter V

Average queue length of PMC algorithmAverage queue length upper bound

0 100 20050 1500

20

40

60

V

Ave

rage

pro

fit

(b) Average profit vs. Parameter V

Fig. 6. (a) Average queue length vs. Parameter V , (b)Average profit vs. Parameter V

We further vary the idle time probability of sensing bandp0from 0 to 1. In Fig. 7, we show the average profit with differentsensing available probabilitiesp0 ∈ [0, 1] and different fixedsensing technologies. The black curve is with zero sensingcost, wherePfa = Pmd = 0.5, which means the operatordoes not perform sensing and takes random guesses of primaryusers’ activities in sensing channels. The blue curve is withthe low sensing cost0.1, wherePfa = 0.1 andPmd = 0.08.The purple curve is with the high sensing cost0.5, wherePfa = 0.008 and Pmd = 0.005. The red curve is thePMC policy, which adaptively choose sensing cost from theabove three (sensing costCs = 0, 0.1, 0.5). When the sensingavailable probability is small (e.g.,p0 ∈ [0, 0.2]), all strategiestend to only choose the leasing channels, thus all curves obtainsimilar profits. When the sensing available probabilityp0further increases, the advantage of exploring sensing channels

13

0 0.2 0.4 0.6 0.8 130

35

40

45

50

55

60

Sensing Available Probablity: p0

Ave

rage

Pro

ft

Average profits with different sensing techniques

zero sensing costlow sensing costhigh sensing costoptimal sensing cost

Fig. 7. Average Profit with different sensing technologies

becomes more significant. The performances for strategiesusing the zero and low sensing cost are not good. The reason isthat their detection accuracy is not good enough. To achievethe primary users’ collision bounds, these strategies choosesensing channels less often, and replace with more expensiveleasing channels. When the sensing available probability ishigh and close to 1, sensing seems unnecessary. Therefore,the performance increases as sensing cost decreases, wherethezero cost is the best and the high cost is the worst. The PMCpolicy (the red curve) adaptively chooses the sensing cost,i.e.,whenp0 is medium, it utilizes the high sensing cost strategy inmost of time slots; asp0 keeps increasing, it gradually changesto utilize the low cost and zero cost strategies more frequently;and whenp0 goes to 1, it utilizes the zero sensing cost strategyin most of time slots. It has the best performance, since it cantake advantage of different sensing technologies for differentsensing available probabilities.

For the M-PMC policy, we conduct simulations for a simpletwo-queue system. For queue-1, the channel gainhi1 of eachchannel satisfies i.i.d. (continuous) Rayleigh distributions withparameterσ = 4.5. For queue-2, the channel gainhi2 of eachchannel satisfies i.i.d. (continuous) Rayleigh distributions withparameterσ = 5.5. This is because queue-2 users are closerto the operator’s base station than queue-1 users.

Figure 8 (a) shows that the average transmission rateobtained by queue-2 users is higher than that of the queue-1 users. This is because queue-2 users usually have betterchannel conditions, and M-PMC policy prefers to allocatemore powers to better channels to improve the transmissionrate. Figure 8 (b) shows that the revenues obtained by theoperator from users of two queues are almost the same whenall queues are stable. It is an interesting observation. We canunderstand it in this way: when two queues make differentrevenue, to maximize the profit (also the revenue), the operatorwill allocate more transmission rates to the queue with ahigher revenue. Thus the length of the queue with a highertransmission rate will be shortened, and the negative queuingeffect in revenue maximization problem will be soon diluted.It further leads to a decreasing price and a decreasing revenue.

In contrast, the length of the queue with a lower transmissionrate increases, which results in an increasing price and anincreasing revenue. Therefore, when all queues are stablefinally, the average revenue generated by each queue is thesame.

0 2000 4000 6000 8000 100000

5

10

15

Time t

Ave

rage

Tra

nsm

issi

on R

ate (a) Average rate

Average rate of queue 1Average rate of queue 2

0 2000 4000 6000 8000 100000

20

40

60

80

Time t

Rev

enue

(b) Average Revenue

Average revenue of queue 1Average revenue of queue 2

Fig. 8. (a) Average transmission rates of a two-queue M-PMC policy, (b) Average revenues of a two-queue M-PMCpolicy

8 CONCLUSIONIn this paper, we study the profit maximization problem ofa cognitive mobile virtual network operator in a dynamicnetwork environment. We propose low-complexity PMC andM-PMC policies which perform both revenue maximizationwith pricing and market control, and cost minimization withproper resource investment and allocation. We show that thesepolicies can achieve arbitrarily close to the optimal profit, andhave flexible trade-offs between profit optimality and queuingdelay.

We also find several interesting features in these close-to-optimal policies. In revenue maximization, the dynamicpricing strategy performs the functionality of congestioncon-trol to users’ demands,i.e., the longer the queue length ofdemands, the higher price the operator should charge. In costminimization, the operator is risk averse towards spectruminvestment, and prefers stable leasing spectrum to unstablesensing spectrum with the same channel condition and thesame availability-price-ratio.

In this paper, we only looked at the issue of elastic traffic.It would be worthwhile to incorporate inelastic traffic, whichusually has strict constraints on transmission rates and delays.Typical examples include real-time multimedia applications,e.g., audio streaming, Video on Demand (VoD), and Voiceover IP (VoIP). In the most general case, we can considera hybrid system with both elastic and inelastic traffic, whichis more realistic and practical. In addition, as mentioned inSection 2, the literature about competition in cognitive radionetworks mainly focus on the static network scenario. It isalso interesting to extend our dynamic model to incorporatecompetition among several network operators.

14

REFERENCES

[1] S. Li, J. Huang, and S. Li, “Profit maximization of cognitive virtualnetwork operator in a dynamic wireless network,” inProc. of IEEEICC, 2012, pp. 1–6.

[2] M. McHenry, P. Tenhula, and D. McCloskey, “Chicago spectrum oc-cupancy measurements and analysis and a long-term studies proposal,”Shared Spectrum Company, Tech. Rep., 2005.

[3] “Fcc released the final rules for the cognitive use of tv white spacesin the us,” 2010, federal Communications Commission:Report FCC-10-174.

[4] I. Akyildiz, W. Lee, M. Vuran, and S. Mohanty, “Next genera-tion/dynamic spectrum access/cognitive radio wireless networks: a sur-vey,” Computer Networks, vol. 50, no. 13, pp. 2127–2159, 2006.

[5] Q. Zhao and B. Sadler, “A survey of dynamic spectrum access,” IEEESignal Processing Magazine, vol. 24, no. 3, pp. 79–89, 2007.

[6] L. Duan, J. Huang, and B. Shou, “Cognitive mobile virtualnetworkoperator: Investment and pricing with supply uncertainty,” in Proc. ofIEEE INFOCOM, 2010.

[7] ——, “Investment and pricing with spectrum uncertainty:A cogni-tive operators perspective,”IEEE Transactions on Mobile Computing,vol. 10, no. 11, pp. 1590 – 1604, November 2011.

[8] “Introduction of mvno,” internet Open Resource: onlineavail-able at http://en.wikipedia.org/wiki/Mobilevirtual network operatorand http://www.mobilein.com/whatis a mvno.htm.

[9] C. Camaran and D. D. Miguel, “Mobile virtual network operator(mvno) basics: What is behind this mobile business trend,” ValorisTelecommunication Practice, Tech. Rep., 2008.

[10] “List of mvno worldwide,” internet Open Resource: online available athttp://www.mobileisgood.com/mvno.php.

[11] C. Stevenson, G. Chouinard, Z. Lei, W. Hu, S. Shellhammer, andW. Caldwell, “Ieee 802.22: The first cognitive radio wireless regionalarea network standard,”IEEE Communications Magazine, vol. 47, no. 1,pp. 130–138, 2009.

[12] A. Al Daoud, T. Alpcan, S. Agarwal, and M. Alanyali, “A stackelberggame for pricing uplink power in wide-band cognitive radio networks,”in Proc. of IEEE CDC, 2008, pp. 1422–1427.

[13] H. Yu, L. Gao, Z. Li, X. Wang, and E. Hossain, “Pricing foruplink powercontrol in cognitive radio networks,”IEEE Transactions on VehicularTechnology, vol. 59, no. 4, pp. 1769–1778, 2010.

[14] J. Jia and Q. Zhang, “Competitions and dynamics of duopoly wirelessservice providers in dynamic spectrum market,” inProc. of ACMMobihoc, 2008, pp. 313–322.

[15] L. Duan, J. Huang, and B. Shou, “Duopoly competition in dynamicspectrum leasing and pricing,”IEEE Transactions on Mobile Computing,vol. 11, no. 11, pp. 1706 – 1719, November 2012.

[16] O. Ileri, D. Samardzija, and N. Mandayam, “Demand responsive pricingand competitive spectrum allocation via a spectrum server,” in Proc. ofIEEE DySPAN, 2005, pp. 194–202.

[17] J. Elias and F. Martignon, “Joint spectrum access and pricing in cognitiveradio networks with elastic traffic,” inProc. of IEEE ICC, 2010, pp. 1–5.

[18] D. Niyato, E. Hossain, and Z. Han, “Dynamics of multiple-seller andmultiple-buyer spectrum trading in cognitive radio networks: a game-theoretic modeling approach,”IEEE Transactions on Mobile Computing,pp. 1009–1022, 2008.

[19] S. Sengupta, M. Chatterjee, and S. Ganguly, “An economic frameworkfor spectrum allocation and service pricing with competitive wirelessservice providers,” inProc. of IEEE DySPAN, 2007, pp. 89–98.

[20] L. Gao, Y. Xu, and X. Wang, “Map: Multi-auctioneer progressive auctionfor dynamic spectrum access,”IEEE Transactions on Mobile Computing,vol. 10, no. 8, pp. 1144 – 1161, August 2011.

[21] J. Jia, Q. Zhang, Q. Zhang, and M. Liu, “Revenue generation fortruthful spectrum auction in dynamic spectrum access,” inProc. of ACMMobiHoc, 2009, pp. 3–12.

[22] L. Huang and M. Neely, “The optimality of two prices: Maximizingrevenue in a stochastic communication system,”IEEE/ACM Transactionson Networking, vol. 18, no. 2, pp. 406–419, 2010.

[23] R. Urgaonkar and M. Neely, “Opportunistic scheduling with reliabilityguarantees in cognitive radio networks,”IEEE Transactions on MobileComputing, vol. 8, no. 6, pp. 766–777, 2009.

[24] M. Lotfinezhad, B. Liang, and E. Sousa, “Optimal controlof constrainedcognitive radio networks with dynamic population size,” inProc. ofIEEE INFOCOM, 2010, pp. 1–9.

[25] M. Weiss, S. Delaere, and W. Lehr, “Sensing as a service:An explorationinto practical implementations of DSA,”Proc. of IEEE DySPAN, 2010.

[26] “Digital dividend: Geolocation for cognitive access,” Independent regu-lator and competition authority for the UK communications industries,Tech. Rep., 2010.

[27] A. Anandkumar, N. Michael, and A. Tang, “Opportunisticspectrumaccess with multiple users: learning under competition,” in Proc. of IEEEINFOCOM, 2010, pp. 1–9.

[28] K. Woyach, P. Pyapali, and A. Sahai, “Can we incentivizesensing in alight-handed way?” inProc. of IEEE DySPAN, 2010, pp. 1–12.

[29] M. Sharma and X. Lin, “Ofdm downlink scheduling for delay-optimality: Many-channel many-source asymptotics with general arrivalprocesses,” inProc. of IEEE ITA, 2011.

[30] M. Neely, “Stochastic Network Optimization with Application to Com-munication and Queueing Systems,”Synthesis Lectures on Communica-tion Networks, vol. 3, no. 1, pp. 1–211, 2010.

[31] S. Li, J. Huang, and S. Li, “Dynamic profit maximization ofcognitive mobile virtual network operator,” Tech. Rep., available athttp://jianwei.ie.cuhk.edu.hk/publication/DynamicCVNOTechReport.pdf.

[32] J. Huang, V. Subramanian, R. Agrawal, and R. Berry, “Downlinkscheduling and resource allocation for ofdm systems,”IEEE Transac-tions on Wireless Communications, vol. 8, no. 1, pp. 288–296, 2009.

[33] D. Palomar and J. Fonollosa, “Practical algorithms fora family ofwaterfilling solutions,”IEEE Transactions on Signal Processing, vol. 53,no. 2, pp. 686–695, 2005.

[34] D. Knuth et al., “Sorting and searching, the art of computer program-ming, vol. 3,” 1973.

[35] K. Talluri and G. Van Ryzin,The Theory and Practice of Revenue Man-agement, International Series in Operations, Research and ManagementScience, Vol. 68. Springer Verlag, 2005.

Shuqin Li received Ph.D. in Information Engi-neering from The Chinese University of HongKong in 2012. She now works as a researchscientist in Research and Innovation, Alcatel-Lucent Shanghai Bell Co., Ltd. Her researchinterests include resource allocation, pricing andrevenue management in communication net-works, applied game theory, contract theory andincentive mechanism design in network eco-nomics, network coding and stochastic networkoptimization.

Jianwei Huang is an Assistant Professor in theDepartment of Information Engineering at theChinese University of Hong Kong. He receivedPh.D. in Electrical and Computer Engineeringfrom Northwestern University in 2005. Dr. Huangcurrently leads the Network Communicationsand Economics Lab (ncel.ie.cuhk.edu.hk). Heis the co-recipient of five best paper awards,including the 2011 IEEE Marconi Prize PaperAward in Wireless Communications .

Dr. Huang currently serves as Editor of IEEETransactions on Wireless Communications, Editor of IEEE Journal onSelected Areas in Communication - Cognitive Radio Series, and Chairof IEEE ComSoc Multimedia Communications Technical Committee.

http://en.wikipedia.org/wiki/Mobile_virtual_network_operator

http://www.mobilein.com/what_is_a_mvno.htm

http://www.mobileisgood.com/mvno.php

http://jianwei.ie.cuhk.edu.hk/publication/DynamicCVNOTechReport.pdf

15

Shuo-Yen Robert Li S.-Y. Robert Li (Bob Li)received the PhD degree from UC Berkeley in1974. He taught applied math at MIT in 1974-76and math/statistics/CS at U. Illinois, Chicago in1976-79. After working at Bell Labs/Bellcore fora decade, he joined CUHK as a chair professorin 1989. He now also serves as Co-director ofInstitute of Network Coding as well as honoraryprofessor in a few universities. Bob initiated alge-braic switching theory and is a cofounder of net-work coding theory. His ”martingale of patterns”

(1980) is widely applied to genetics.

APPENDIX APROOF FOR THEOREM 1 (A)We first prove (34) by induction.

It is easy to see that at slott = 0, no packets are inthe network andQ(0) = 0, thus the queue length bound(34) obviously holds. Now suppose (34) holds for timet. Weconsider the queue length bound in slott+1 in the followingcases:

• Case 1: Q(t) ≤ V qmax, then clearlyQ(t+1) ≤ V qmax+Amax.

• Case 2: Q(t) > V qmax, i.e., the objective value of therevenue maximization (20) is negative. Therefor, accord-ing to the optimization solution, the operator will setO(t) = 0, and do not accept any new request,A(t) = 0.Therefore,Q(t+ 1) ≤ Q(t) ≤ V qmax +Amax.

Likewise, we can also prove the virtual queue bound (35)by induction. Suppose that the inequality (35) holds for timet, and consider the following two cases.

• Case 1: Zi(t) ≤ Zmax−1, clearly the virtual queue lengthbound (35) also holds at slott+ 1.

• Case 2: Zi(t) > Zmax − 1, i.e.,

Zi(t) ≥ maxCs

rmaxQmaxαi(t)

(1− p0)Pmd(Cs(t))(53)

Sincermax ≥∑

i∈Bmaxri(t) =

∑i∈Bmax

log(ωi(t)hi(t)

λ(t)

),

then inequality (53) implies

Ci(t)=Cs(t)+Zi(t)1−p0V

>αi(t) log

(ωi(t)hi(t)

λ(t)

).

(54)

By (32) in the PMC policy, channeli will not bechosen for sensing and transmission, thus there will beno collision in this channel,i.e., Xi(t+1) = 0. Then by(13) we haveZi(t + 1) ≤ Zi(t) and the virtual queuelength bound (35) also holds att+ 1.

APPENDIX BPROOF FOR THEOREM 1 (B)We first construct astationary randomized policy that canachieve the optimal solution of the PM problem. Let usconsider a special class of stationary randomized policies,called the φ-only policy, which makes the decisionγ(t)in slot t only depending on the observation of systemparameterφ(t). The stationary distribution for the observ-able parameterφ(t) is denoted as{Πφ, φ ∈ Φ}. (Recall

that we defineφ(t)∆= (M(t),h(t), Cl(t)) and γ(t)

∆=

(O(t), q(t), Cs(t),Bs(t),Bl(t),P (t)). Note that the value ofφ(t) can be chosen only from a finite setΦ.) In the φ-onlypolicy, when the operator observesφ(t) = φ, it choosesγ(t)from the countable collection ofΓφ(t) = {γ1

φ, γ2φ, . . . } with

probabilities{ρ1φ, ρ2φ . . . }, where

∑∞u=1 ρ

uφ = 1. Note that the

decision is independent of timet, and thus is stationary. Wehave the following fact:

There exists a stationaryφ-only policy that achieves theoptimal profit of the PM problem while satisfying stability

16

condition (11) and collision upper-bound requirement (12),which is the solution of the following optimization problem:

R∗= Maximize

∑

φ∈Φ

Πφ

∞∑

u=1

R(γuφ ;φ)ρ

uφ

Subject to∑

φ∈Φ

Πφ

∞∑

u=1

r(γuφ ;φ)ρ

uφ ≤

∑

φ∈Φ

Πφ

∞∑

u=1

D(γuφ ;φ)ρ

uφ

∑

φ∈Φ

Πφ

∞∑

u=1

Xi(γuφ ;φ)ρ

uφ ≤ ηi, i ∈ Bs

max

The above fact is a special case of Theorem 4.5 in [30]. Theproof is omitted for brevity.

Recall that the PMC policy is derived by minimizing theright hand side of the following inequality

∆(Θ(t))− V E [RPMC(t)|Θ(t)] ≤ D − V E [R(t)|Θ(t)]

+Q(t)E [O(t)A(t) − r(t)|Θ(t)]

+∑

i∈Bsmax

Zi(t)E [Xi(t)− ηi|Θ(t)] . (55)

In other words, given the current queue backlogs for eachslot t, the PMC policy minimizes the right hand side of (55)over all alternative feasible policies that could be implemented,including the optimal stationaryφ-only policy. Therefore, byplugging the optimal stationaryφ-only policy in the right handside of (55), we have

∆(Θ(t))− V E [RPMC(t)|Θ(t)] ≤ D − V R∗. (56)

Now we use the following lemma to obtain the performancebound in Theorem 1(b).

Lemma 1: (Lyapunov Optimization) Suppose there are fi-nite constantsV > 0, D > 0, such that for all time slotst ∈ {0, 1, 2, . . .} and all possible values ofΘ(t), we have

∆(Θ(t))− V E [R(t)|Θ(t)] ≤ D − V R∗. (57)

Then we have the following result

lim supt→∞

1

t

t−1∑

τ=0

E [R(t)] ≥ R∗−

D

V. (58)

The above lemma is a special case of Theorem 4.2 in [30].The proof is omitted for brevity.

Note that the inequality (56) is exact the condition (57)in Lemma 1, thus the performance bound in Theorem 1(b)immediately follows.

APPENDIX CPSEUDO CODE OF ALGORITHM 5APPENDIX DIMPACT OF QUEUEING ON THE REVENUE MAX-IMIZATION PROBLEM

What is the impact of the queuing effect on the pricingin the revenue maximization problem (20)? Let’s considerthe following instantaneous revenue maximization problemwithout the queueing shift.

Maximize qD(q,M). (59)

For simplicity, we ignore the time index in the discussion.

Algorithm 5 Search ThresholdΥl (or Υs) for a givenCs(t)

1: procedure SEARCHINGTHRESHOLD(Cs)2: Rearrange the channel indecenti ∈ Bl

max (or i ∈Bsmax) as the decreasing order ofgi(t).

3: m← |Blmax| (or m← |Bs

max|), λ⇐ Λ(m)4: while λ ≥ gm(t) do5: if m > 1 then6: m← m− 17: λ⇐ Λ(m)8: else9: break

10: end if11: end while12: Υl ← m (or Υs ← m)13: end procedure

Note that both problems in (20) and (59) may have multipleoptimal solutions. For the purpose of obtaining intuitions, wewill restrict our discussion to the case where there is a uniqueoptimal price for both (20) and (59). To guarantee this, weassume that revenueR(D)

∆= q(D)D is a strictly concave

function of the demand5, whereq(D) is defined as the inverse

demand function,i.e., q(D)∆= max{q : D(q,m) = D}6 for

a given m. For simplicity, we denote the optimal price inrevenue maximization (20) asq∗, and the optimal price inrevenue maximization problem (59) with the queuing shift asq∗∗. We will show thatq∗∗ ≥ q∗, i.e., the queuing effect leadsto a higher price.

The objective of revenue maximization problem in (59) canbe represented asR(D) = q(D)D, and its optimal demandis denoted asD∗. By the first order optimality condition,D∗ satisfies thatR′(D∗) = 0, whereR′(·) denotes the firstorder derivative ofR(·). When the queuing effect is takeninto consideration, the objective of (20) can be representedasR(D) − DQ

V, and we denote its optimal demand asD∗∗.

Again by the first optimality condition,D∗∗ satisfies thatR′(D∗∗) = Q

V≥ 0. Since the revenue functionR(D)

is concave inD, R′(D) is a decreasing function. SinceR′(D∗∗) ≥ R′(D∗), we obtainD∗∗ ≤ D∗. Furthermore, sincethe demandD(q,m) is non-increasing in priceq for a givenm, we haveq∗∗ ≥ q∗. In other word, when incorporating thequeuing effect, the optimal dynamic priceq∗∗ in the PMCpolicy is higher than the optimal priceq∗ in instantaneousrevenue maximization problem without the shift. Moreover,the larger the queue lengthQ, the higher the dynamic priceq in the PMC policy. When we perform such pricing in thesystem, a high price will decrease the demand, which will slowthe increase of the queue length. Thus the dynamic pricing inthe PMC policy also performs the functionality of congestioncontrol to some extent.

For a more general case where the concavity assumptionmay not be satisfied, the queueing effect depends on the shape

5. This assumption is common in the revenue management literature (e.g.,[35]) to guarantee unique optimal pricing.

6. Since the demand functionD(q,m) is non-increasing inq, there can bemultiple prices resulting the same demand.

17

of the revenue function at the pointD∗, i.e., q∗∗ ≥ q∗ if andonly if R′(D∗) ≤ Q

V.

dynamic proﬁt maximization of cognitive mobile virtual ... · secondary operator obtains...

Documents