diﬀerential turbo coded modulation with app channel …

Differential Turbo Coded Modulation with APP Channel

Estimation

Sheryl L. Howard and Christian Schlegel

Dept. of Electrical & Computer Engineering

University of Alberta

Edmonton, AB Canada T6G 2V4

Email: sheryl,[email protected]

December 21, 2003

Abstract

A serially concatenated coding system which can operate without channel state information(CSI) with use of a simple channel estimation technique is presented. This channel estima-tion technique utilizes the inner decoder’s a posteriori probability (APP) information aboutthe transmitted symbols to form a channel estimate for each symbol interval, and is termedAPP channel estimation. The serially concatenated code (SCC) is comprised of an outer rate2/3 binary error control code, separated by a bit interleaver from an inner code consistingof an 8-PSK bit mapping and differential 8-PSK modulation. Coherent decoding results inperformance 0.6 dB from 8-PSK capacity for large interleaver sizes. APP channel estimationdecoding without CSI over constant, random walk and linear channel phase models showsnear-coherent results, with fractions of a dB loss in turbo cliff for the random walk phasemodel.

Keywords: Differential modulation, serial concatenated codes, channel phase estimation, iterativedecoding.

1 Introduction

With the advent of turbo codes [1], [2] and iterative decoding of serially concatenated codes [4],

the push for near-capacity performance of error control codes has become a reality. Use of these

near-capacity codes with iterative decoding now allows receiver operation in very high noise/low

signal-to-noise (SNR) environments.

Turbo trellis coded modulation (TTCM) has shown performance gains with higher order mod-

ulation similar to those of binary turbo coding. Both parallel concatenated TCM with bit inter-

leaving [5] and symbol interleaving [6] using 8-PSK, 16-QAM and 64-QAM modulation and serial

concatenated TCM for 8-PSK [7], [8] have been examined.

However, at the low SNR values achievable with turbo codes, issues such as phase synchronization

become critical, especially for higher order modulations. Conventional phase synchronization utilizes

a PLL (phase-locked loop) or Costas loop [9], resulting in phase ambiguities for PSK constellations.

In addition, the squaring loss for higher order PSK modulation becomes significant, effectively

prohibiting the use of these mechanisms for phase synchronization at the low SNRs at which turbo

codes can operate. For 8-PSK suppressed-carrier signalling, the squaring loss of an eighth-power-law

device at Es

N0

=9 dB is upper-bounded by 10 dB [10] with respect to PLL operation on an unmodulated

carrier. Typical loop SNRs must be at least 6 dB to achieve synchronization [11], [12]; thus the

eighth-power squaring device must see a minimum loop SNR of 16 dB to possibly achieve lock.

This requires narrow loop bandwidth, which is not compatible with fast tracking and acquisition as

needed for wireless packet systems. Effective phase synchronization for iteratively decoded systems

utilizing higher order PSK modulation becomes highly problematic.

The classical method of eliminating phase synchronization is differential M-PSK encoding with

differential demodulation; however, a 3 dB loss in SNR vs bit error rate (BER) occurs for M-PSK,

with M > 2 [13]. This also applies to turbo coding. Differential BPSK modulation resulted in a

2.7 dB loss in SNR [14] for a rate 1/2 turbo code using differential detection. Such significant loss

2

is counter to the near-capacity performance expected of turbo codes.

Various techniques to mitigate the penalty incurred by differential detection have been used,

such as multiple symbol differential detection [15], [16], [17], which has been applied to serially

concatenated codes with iterative decoding, using linear prediction to obtain channel estimates [18].

This technique results in an exponential expansion of the decoding trellis. An expanded-state

decoding trellis is also presented in [19] for serial concatenation of a rate 1/2 convolutional code

with differential M-PSK modulation. The channel phase is discretized into N states, resulting in

a linear trellis expansion of MN states. Similarly, iterative decoding of turbo codes with QPSK

modulation incorporates channel estimation for fading channels by using quantized phase in an

expanded ”supertrellis” in [20]. Instead of expanding the already complex APP decoding trellis,

channel estimation of PSAM (pilot-symbol-assisted modulation) BPSK turbo codes over fading

channels is accomplished outside the APP decoder in [21], yet still within the iterative decoding

block to allow for iteratively improving channel estimates. A BER of 10−4 within 0.5 dB of coherent

detection for slow fading is achieved.

Turbo-embedded estimation (TEE) is an alternate approach that has been investigated for BPSK

in [22] and extended to QPSK and 8-PSK [23]. This technique uses the most probable state

during the forward recursion of the APP decoder to obtain a symbol estimate, which is fed to

a simple tracking loop to compute an updated phase estimate. No state expansion of the APP

trellis, and corresponding increase in complexity, occurs. However, TEE requires an initial phase

estimate to begin decoding; this phase estimate is obtained from a known preamble whose length

is approximately 1.5% of the total packet length. Results over the AWGN channel, with phase

noise according to a recent DVB-S2 proposal [24], for rate 1/3 8-PSK modulation are 0.3 dB from

coherent decoding at a BER of 10−5, increasing with rate 1/2 8-PSK modulation to 0.7 dB from

coherent decoding at a BER of 2 × 10−3.

From a coding perspective, differential modulation may be viewed as a recursive convolutional

3

code [25] and, as such, can be used as a viable inner code in a serially concatenated coding (SCC)

system which will exhibit interleaver gain [4]. This idea has been used for binary SCCs [26], [25],

and shown to provide good performance when coherently decoded. In this paper, we approach the

construction of an SCC for higher order modulations from a code design perspective. We also show

how use of an inner differential M-PSK code leads in a natural way to a decoder which functions

without any prior channel phase information, providing performance nearly indistinguishable from

that of an ideal receiver with complete phase knowledge.

This decoder relies on the ability of the APP algorithm for the differential inner modulation code

to provide useful soft information on the information symbols without any prior phase knowledge,

even in the presence of significant time-varying channel phase offset, which is confirmed via EXIT

analysis [27], [28]. This soft information is used by a low-complexity phase estimator to iteratively

improve the channel metrics. Through iterations, decoding performance close to the complete phase

knowledge scenario is achieved with integrated APP channel phase estimation. This method uses

APP soft information from the differential decoder without expanding trellis complexity. Neither

external phase estimation, such as a training sequence preamble or non-data-aided (NDA) phase

estimation, nor differential demodulation are needed to begin the decoding process.

An EXIT analysis approach [27], [28] is used to find matched outer codes that work optimally

with the inner modulation code. The [3,2,2] parity check code is such a code, providing turbo cliff

results only 0.6 dB away [38] from the 8-PSK capacity at a system rate of 2 bits/symbol [33], [41].

An 8-PSK mapping specifically designed to optimize this system’s error floor [29] is presented.

We show the performance of our serially concatenated system compares favorably with two

coherently-decoded serially concatenated systems, using 8-PSK modulation at a system rate of 2

bits/symbol. These include 1) serial concatenated trellis coded modulation (SCTCM) [7], [8] and

2) soft-decision iteratively decoded bit-interleaved coded modulation (BICM-ID) [32].

The paper is organized as follows: Section 2 describes the serially concatenated binary error

4

control code/differential 8-PSK system, examining the encoder, simulated channel, and decoder.

Section 3 discusses analysis techniques used to optimize the system; distance spectrum analysis

allows improvement of the error floor through a new mapping, while EXIT analysis predicts turbo

cliff behavior and completes the error analysis. Section 4 presents our method of obtaining channel

estimates when decoding without CSI, which we term APP channel estimation. Section 5 presents

simulation results. Conclusions are discussed in Section 6.

2 System Description

2.1 Encoding System

The encoder for our proposed serially concatenated transmission system is shown in Figure 1. A

sequence of information bits u = [u0, . . . , uL−1] of length L is encoded by an outer rate 2/3 binary

error control code, such as the [3,2,2] parity code, into a sequence of coded bits v = [v0, . . . , v 3L

2−1]

of length 3L2

. The coded bits v are interleaved bitwise through a random interleaver [2],[4]. These

interleaved coded bits v′ are mapped to 8-PSK symbols w, wk =√

Esejθk ; the symbol energy

is normalized to Es = 1, and the 8-PSK symbols have unit magnitude and phase angle θk ∈

[0, π/4, . . . , 7π/4] rads. Initially, we assume natural mapping, which labels the bit combinations

000 − 111 increasing consecutively counterclockwise from 0 rads.

Figure 1 goes here.

The 8-PSK symbols are then differentially encoded, so that the current transmitted differential

symbol xk = wkxk−1. This differential 8-PSK encoding serves as the inner code of the serially con-

catenated system, and is viewed as a recursive non-systematic convolutional code [25]. A recursive

inner code is necessary to provide interleaving gain in a serially concatenated system [4]. This

inner code has a regular, fully-connected 8-state trellis, in which the trellis states correspond to the

5

current transmitted differential symbol. Transmission of a block of encoded symbols is initialized

with x0 = w0, and terminated with a reference symbol xL

2

=√

Esej0 to end in the zero state.

A simplified three-stage differential 8-PSK trellis section (corresponding to three symbol time

intervals) is shown in Figure 2. The actual trellis is fully connected but for clarity, only the first

stage shows all connections. Two decoding paths are depicted in the trellis section. The solid bold

path indicates the correct path, with the dashed bold path above it resulting from a phase rotation

of π/2 radians. Both the original 8-PSK symbols w and the transmitted D8-PSK symbols x are

shown as w, x above each corresponding trellis branch. Notice that while the π/2 phase rotation

affects the transmitted D8-PSK symbols x, the information 8-PSK symbols w are identical on both

paths. The trellis and code are rotationally invariant. This rotational invariance will be significant

when considering system decoding without phase synchronization.

Figure 2 goes here.

2.2 Channel Model

We use a complex AWGN channel model with noise variance N0/2 in each dimension. Initially, the

channel phase is assumed known at the receiver. The received symbols y consist of the transmitted

symbols plus complex noise of variance N0, i.e., y = x + n.

Next, the assumption of phase knowledge is dropped, and we consider an unknown phase offset

at the receiver. The channel now consists of complex AWGN n plus a possibly time-varying symbol

phase offset φ: yk = xkejφk + nk.

Three different phase models are considered:

• Constant but unknown phase offset φk = Φ, ∀ k.

This could model short packet transmission systems or frequency hopping, where the phase

could be assumed approximately constant over one transmission block.

6

• Gaussian random walk model. The unknown phase evolves as φk = φk−1 + ∆Φk, where the

phase process ∆Φk is given by zero-mean i.i.d. Gaussian random variables with variance σ2Φ.

This could model phase jitter from oscillator instability.

• Constant Frequency Offset ∆f . Phase offset equals frequency offset multiplied by time, thus

a constant frequency offset results in a linear phase offset of slope 2π∆f rads w.r.t. time. For

simulation purposes, we assume the phase offset constant during one symbol interval T ; the

phase increases by a discrete amount of 2π∆fT rads for each consecutive symbol interval as

φk = φk−1 + 2π∆fT . This could model a Doppler scenario.

2.3 Iterative Decoding

Iterative decoding of this serially concatenated system proceeds according to turbo decoding princi-

ples [2], [4]. Figure 3 displays a block diagram of the decoding process. The APP channel estimation

block is shown in the dashed rectangle, and is discussed in detail in Section 4. Note that differential

demodulation is never used at the receiver for detection with or without phase knowledge, when

APP channel estimation is employed. Assuming detection with known phase for now, received

channel symbols y are converted to channel metrics p(yk|xk) = 1

πN0

e−|yk−xk|2/N0 , which are fed to

the inner APP decoder for the differential code, along with a priori information Pa(w). No a priori

information is available for the inner APP decoder during the first iteration, thus uniform a priori

probabilities are used; Pa

(

wk = ej2lπ/8)

= 1/8, ∀l = 1, . . . , 8.

Figure 3 goes here.

The inner APP decoder uses the BCJR [36] or forward-backward algorithm operating on the 8-

state trellis of the differential 8-PSK code, depicted in Figure 2, to calculate conditional symbol

probabilities on both the 8-PSK symbols w and the transmitted D8-PSK symbols x. The APP

8-PSK symbol probabilities PAPP(w) are converted first to APP bit probabilities PAPP(v′) through

7

marginalization, for example,

Pe(v′i = 0) =

∑

wk:v′i=0

Pe(wk(v′i = 0, v′

i+1, vi+2)) (1)

The APP bit probabilities are converted to APP LLRs λAPP(v′). Extrinsic output LLRs λe(v′) are

obtained as usual by subtracting the a priori LLRs λa(v′) from the APP LLRs λAPP(v′).

Deinterleaving the extrinsic LLRs λe(v′) provides a priori LLRs λa(v) as input to the outer APP

decoder. The outer APP decoder for a simple binary code may be implemented with low complexity;

for example, the [3,2,2] parity code examined in this paper is simple enough, with only 3 bits and

4 possible codewords, to implement the 6 APP equations explicitly. The APP probabilities that

express the parity constraints of the code are given by

PAPP(v1 = 0) ∝ Pa(v1 = 0) · (Pa(v2 = 0)Pa(v3 = 0) + Pa(v2 = 1)Pa(v3 = 1)) (2)

PAPP(v1 = 1) ∝ Pa(v1 = 1) · (Pa(v2 = 0)Pa(v3 = 1) + Pa(v2 = 1)Pa(v3 = 0)) (3)

and analogously for v2 and v3. APP LLRs λAPP(v) are computed from these, and extrinsic LLRs

λe(v) obtained by subtracting off the a priori LLRs λa(v).

These extrinsic LLRs λe(v) are then interleaved to obtain new a priori LLRs λa(v′). These

must be converted first to bit probabilities Pa(v′), then to a priori symbol probabilities Pa(w) for

use in the next iteration of decoding in the inner APP symbol decoder. Symbol probabilities are

calculated as the normalized product of their component bit probabilities. In this fashion, with

inner and outer APP decoders exchanging extrinsic information each iteration, iterative decoding

continues until convergence or a fixed number of iterations is reached, at which time a hard decision

on the APP information bit LLRs λAPP(u) from the outer binary APP decoder determines the

estimated information sequence u.

8

3 Code Properties and Performance Analysis

Turbo code performance may be divided into three regions. In the first, the low SNR/high BER

region, the turbo code does not perform well and iterative decoding has minimal effect. The turbo

cliff or waterfall region follows, where the BER performance drops sharply to low values in only

fractions of a dB SNR increase. Finally, at high SNR, there may be an error floor or flare where

the BER curve flattens out due to the predominance of low-weight error events.

The latter two regions, the turbo cliff and error floor regions, require separate methodologies

to analyze concatenated code performance. Extrinsic mutual information transfer, or EXIT, anal-

ysis [27], [28] is used to accurately predict turbo cliff onset SNR. Mutual information serves as a

reliability measure of the soft information into and out of each component decoder.

The second method is the minimum distance asymptote approximation [40] of the error perfor-

mance in the high SNR error floor region, where performance of turbo coded systems flattens out by

following the error curve of the most likely, minimum-weight sequence error event. Both methods

are used to demonstrate the superior behavior of serially concatenated coded modulation and to

optimize system performance. We first examine our system via minimum distance analysis.

3.1 Minimum Distance Analysis

The trellis of a differential 8-PSK encoder is fully connected, so that the shortest error event is

always only 2 branches long. With respect to the all-zeros sequence, the 7 possible two-branch error

events are listed below; the first symbol in each branch is the original 8-PSK symbol and the second

symbol is the differential 8-PSK symbol, as shown in Figure 2.

Noting that merging branches all carry the same output symbol, only the diverging branch con-

tributes to the minimum squared Euclidean distance (MSED). The two-branch error event MSED

is simply that of naturally mapped 8-PSK, 0.586, found in (1) and (7). Without an outer code, the

9

input sequences to the D8-PSK encoder are unconstrained, and the D8-PSK MSED is 0.586.

(1) 1 − 1 7 − 0 (4) 4 − 4 4 − 0 (7) 7 − 7 1 − 0

(2) 2 − 2 6 − 0 (5) 5 − 5 3 − 0

(3) 3 − 3 5 − 0 (6) 6 − 6 2 − 0

The 8-PSK mapping is not regular, i.e., all symbols separated by a given squared Euclidean distance

(SED) do not have the same Hamming distance between them. For example, rotating 45o from one

symbol to the next gives a SED=0.586, but Hamming distance dH varies from 1 (between 000 at

0 rads and 001 at π/4 rads) to 3 (between 000 at 0 rads and 111 at 7π/4 rads). The distance

between the all-zeros sequence and an error sequence is not representative of the distance between

all correct and incorrect paths when the symbol mapping is not regular. Thus we must consider

all sequences, not just the all-zeros sequence, recognizing that parallel branches of the differential

trellis carry identical information symbols due to the rotationally invariant trellis and thus parallel

paths are equivalent. However, consideration of the all-zeros sequence is sufficient for the moment

to show the MSED of the differential code.

Calculation of the minimum distance of turbo coded systems with random interleavers is sig-

nificantly more involved than considering the minimum distance of the component codes [4], [40].

The input sequences to the inner decoder for our serially concatenated system are constrained to be

interleaved codewords of the outer [3,2,2] parity-check code, which all have Hamming weight dH=2,

and thus the entire interleaved sequence is constrained to even Hamming weight.

This system is expected to manifest a rather flat error floor due to the very low MSED of 0.586;

simulation results presented later will demonstrate such an error floor.

We now consider the MSED error events, with the goal of better matching the 8-PSK mapping

to the parity code sequence constraints to reduce the most likely error events. The minimum length

10

detour of the inner code is 6 bits long for a two-branch error event. If natural 8-PSK mapping is

used, the seven possible bit sequences for the two-branch error events with the all-zeros sequence

as reference are given in the left side of Table 1.

Table 1 goes here.

Five out of seven of the two-branch detours in Table 1 have even weight, and are permissible

sequences. An 8-PSK signal mapping that generates primarily odd-weight two-branch error events

would lower the probability of choosing a MSED sequence, and be better matched to the outer

parity check code. Such an improved mapping, presented in [29], is given in the center section of

Table 1. The input bit sequences for the two-branch error events using the improved mapping with

respect to the all-zeros sequence are shown on the right.

The improved mapping has only one even-weight sequence, 010-010, which generates a two-

branch error event with a squared Euclidean distance (SED) of 4.0. The sequences 111-101 and

101-111 both have SED of 0.586, but are not eligible as two-branch error events because they are of

odd weight. When we consider all other reference sequences besides the all-zeros sequence, there is

only one two-branch error event resulting in MSED=0.586 and minimum dH=2. This MSED error

event occurs with the sequence pair v′, v′=010-011, 011-010.

This improved mapping has the same number of even-weight MSED two-branch error events, but

far fewer (1 vs. 16) dH=2 MSED error events. It can be shown [30] that a random interleaver is far

more likely to contain a permutation allowing a dH=2 two-branch error event for both these map-

pings, with probability independent of interleaver length, than one for dH=4 or 6, which decreases

as O(N−1). The improved mapping significantly reduces the number of dH=2 MSED two-branch

error events compared to natural mapping. This reduction of MSED multiplicity lowers the error

floor as will be seen in Section 3.3. It does not increase the MSED of the code, which remains at

0.586 with high likelihood.

Use of a spread interleaver lowers the error floor further. An S-random interleaver [31] with

11

spreading S ≥ 6 will prevent the occurrence of a single two-branch error event, though it cannot

prevent the occurrence of two two-branch error events. The MSED of the concatenated code with

a S-random spread interleaver of S ≥ 6 is 1.172.

3.2 EXIT Analysis

EXIT (extrinsic information transfer) analysis [27], [28] is a valuable technique for evaluating con-

catenated system performance in the turbo cliff, or waterfall, region. The mutual information

I(X; E) between symbols X and the extrinsic soft information E with regards to X is used as a

measure of the reliability of E generated by each component decoder. Likewise, I(X; A) measures

the reliability of the a priori soft information A into the component decoder, with regards to X.

Input a priori LLRs A are generated assuming a Gaussian distribution for p(A). Given A,

with an associated I(X; A) = IA, the component APP decoder will produce E, with associated

I(X; E) = IE . The inner decoder also requires channel metrics on the transmitted symbols, and

thus is dependent on the channel SNR. The outer decoder of a serially concatenated system never

sees the channel information and is independent of SNR. In this manner, an EXIT chart displaying

IA, ranging from 0 to 1, versus IE for the APP decoder is produced. These component EXIT charts

are used to study the convergence behavior of concatenated iteratively decoded systems.

The interleaving process does not alter mutual information; while scrambling the soft informa-

tion, interleaving does not change its distribution. In addition, interleaving destroys any correlation

between successive symbols. This separation and independence between the two decoders allows the

component decoder EXIT charts to be combined into a single EXIT graph depicting the iterative

behavior of the turbo decoding process. Each component decoder is simulated individually, without

the need for implementation and simulation of the actual concatenated system. EXIT analysis

allows the component codes to be chosen for optimization of concatenated system performance,

without lengthy simulation of each code combination.

12

For our concatenated system, the outer parity code produces binary soft information which can

be processed as LLRs Ao and Eo, with associated I(Ao; V ) = IAoand I(Eo; V ) = IEo

. As the

interleaver operates bitwise, the extrinsic interleaved bit LLRs will be passed on to the inner D8-

PSK APP decoder. However, as the inner decoder operates on symbols, these interleaved LLRs Ai

(or their corresponding bit probabilities Pa(v′)) must be converted to symbol probabilities Pa(w) as

discussed in Section 2. Conversely, the inner APP extrinsic information will be symbol probabilities

Pe(w), which must be converted to bit probabilities Pe(v′), or their corresponding LLRs Ei, for

deinterleaving. I(Ai; V′) = IAi

and I(Ei; V′) = IEi

are calculated for the inner LLR values.

Figure 4 displays the EXIT chart for our serially concatenated system with the [3,2,2] parity

check code as outer code (IEoon the horizontal axis, and IAo

on the vertical axis) and differential

8-PSK as the inner code (with axes swapped). The inner decoder EXIT curves depend on SNR,

and are shown for 3.4 and 3.6 dB. The improved 8-PSK mapping discussed in Section 3.1 is used.

Natural 8-PSK mapping allows for earlier turbo cliff onset at SNR 3.4 dB, but produces a higher

error floor. An EXIT trajectory for the improved mapping at SNR 3.6 dB is shown also. All curves

are simulated using a 180000 bit interleaver.

Figure 4 goes here.

Notice the close fit between the outer [3,2,2] parity check and inner differential 8-PSK EXIT curves

at SNR 3.4 dB. The two codes are well-matched, in the sense that the combined codes minimize

turbo cliff onset compared to a set of less well-matched codes.

An EXIT curve for an outer rate 2/3 16 state maximal free distance recursive systematic convo-

lutional code is also shown in Figure 4, together with an inner differential 8-PSK EXIT curve for

SNR 5 dB. Natural mapping is used. The free distance of this convolutional code is 5, compared to

a free distance of 2 for the [3,2,2] parity check code. The increase in free distance of the outer code

increases the minimum distance of the concatenated code and significantly lowers and reduces the

error floor which exists with the parity check code. However, reduction of the error floor comes at

13

a large increase in turbo cliff onset SNR.

As shown in Figure 4, pinchoff for the convolutional code and differential 8-PSK modulation

occurs at SNR 5 dB, 1.7 dB past that of the concatenated system with the outer [3,2,2] parity code.

It is clear from the EXIT curve of the outer rate 2/3 convolutional code that the differential EXIT

curve must be raised significantly higher by increasing SNR to clear the outer EXIT curve, and

provide a channel for iterative convergence. This increase in turbo cliff onset is due to the poor

match, as shown by EXIT chart, between component decoders.

3.3 Code Performance

Figure 5 shows the performance of our proposed serially concatenated coded modulation system

with both natural and improved 8-PSK mapping and random interleaving. Results are shown for

interleaver sizes of 15000 bits and 180000 bits. Notice the lowered error floor of the improved

mapping. Also shown is the improved mapping with a fixed S-random spread interleaver of S=9

and 15000 bits. The spread interleaver lowers the error floor further. At a rate of 2 bits/symbol, 8-

PSK capacity is at Eb/N0 = 2.9 dB [33], [41]. Our concatenated system provides BER performance

0.6-0.8 dB away from capacity for large interleaver size.

The BER performance of our concatenated system underscores the effectiveness of simple design

techniques, combined with turbo coding analysis techniques, in designing and optimizing systems

for excellent performance. Not only are the two component decoders very simple to implement

(an 8-state trellis decoder for the inner code and a lookup table for the outer code) but taken

individually, their error control potential is virtually zero. The [3,2,2] parity check code cannot

correct even single errors, and differential 8-PSK modulation alone is a very weak code. Together,

however, they unfold the full potential of turbo coding, outperforming even large 8-PSK Ungerbock

trellis codes [34] by 1 dB. A 64-state 8-PSK TCM code achieves an asymptotic coding gain over

uncoded QPSK of 5 dB at a rate of 2 bits/symbol. As shown in Figure 5, our SCC system provides

14

performance results 6 dB better than uncoded QPSK at a BER=10−5.

Along the turbo cliff, 50 decoding iterations are required for convergence. EXIT analysis pre-

dicted that a large number of iterations along the turbo cliff would be required for convergence,

due to the well-matched EXIT curves of the component codes, which result in a low SNR turbo

cliff onset but provide only a narrow tunnel for iterative improvement. Convergence above 4.5 dB

occurs in 10 iterations or less. A minimum of 50 errors per data point were collected.

As predicted by EXIT analysis, the larger interleaver size shows turbo cliff onset at SNR=3.3

dB for natural mapping and 3.5 dB for the improved mapping. Natural mapping provides a 0.2 dB

advantage in turbo cliff onset, at the cost of a higher error floor. For the smaller interleaver size,

along the turbo cliff, convergence requires 50 iterations; at SNR 4 dB, 30 iterations are required

for convergence, decreasing to 10 iterations at SNR 5 dB and above. EXIT analysis also predicted

these rates of convergence.

Figure 5 goes here.

Our coherently decoded SCC achieves a BER of 2× 10−6 at an SNR of 3.9 dB, with an interleaver

size of 15,000 bits using the improved 8-PSK mapping. In comparison, the serially concatenated

TTCM system presented in [7], which assumes phase synchronization, provides a BER of 2 × 10−5

at SNR=3.7 dB, the largest SNR value simulated, using an interleaver size of 16,385 bits and 8-PSK

modulation at a rate of 2 bits/symbol over the AWGN channel.

Bit-interleaved coded modulation using iterative decoding (BICM-ID) has been examined for 8-

PSK modulation [32]. At a rate of 2 bits/symbol over an AWGN channel with phase synchronization,

BICM-ID achieves a BER of 3 × 10−5 at SNR=4.2 dB with an interleaver of 6000 bits.

Our coherently decoded SCC provides comparable performance to the SCTCM system, and

superior performance to BICM-ID, and in addition, offers the potential for decoding without CSI.

We now examine system performance in the presence of phase noise, without CSI, making use of

our APP channel estimator.

15

4 Decoding Without Channel Information

Accurate phase acquisition and tracking on physical channels at low SNR to achieve coherent

decoding is no easy task. Implementation of the required algorithms often consumes more VLSI

area than the decoder itself. Thus, we now consider the case when the received channel phase

is unknown; decoding must be performed without a priori CSI, i.e., ”non-coherently”. A key

observation is that the rotationally invariant property of the inner code allows us to extract a

posteriori information on the information symbols wk, even without knowledge of the carrier phase.

This soft information is provided as symbol probabilities from the inner APP decoder.

Figure 6 shows the EXIT chart for the differential 8-PSK decoder with various phase offsets

at SNR=4.5 dB. Neither differential detection nor pilot symbols were used, and external channel

information is not used, i.e., phase synchronization is not assumed. The D8-PSK APP decoder

provides some extrinsic information in the presence of phase offset, even when no a priori information

is available (along the vertical axis, corresponding to the initial iteration, when IAi= 0). Even the

worst case offset between two symbols, π/8 rads, still provides some extrinsic information initially.

The presence of extrinsic information without any a priori information is significant; no external

method of generating initial phase information will be necessary, such as a training sequence or

non-data-aided (NDA) phase estimator, nor will differential demodulation be needed.

This initial extrinsic information is not sufficient, however, to complete convergence; as seen in

Figure 6, the component EXIT curves intersect, and decoding will be stuck there at a high bit error

rate for all except φ = 0o. Hence we propose to use this soft information from the inner decoder to

estimate and track the channel phase through successive iterations. As we shall show, this leads to

a decoder which achieves convergence even in the presence of significant channel phase offset.

Figure 6 goes here.

The inner APP decoder generates both 8-PSK symbol probabilities P (w) and D8-PSK symbol

16

probabilities P (x); the latter will be used as input to a channel estimator for subsequent iterations.

Low complexity is important, given the emphasis on implementation simplicity leading to our choice

of component codes. The channel estimator complexity must not overshadow that of the decoding

system, and thus an optimal linear estimator such as the minimum mean square error (MMSE)

estimator is not considered. We choose a simpler filtering estimator, presented in [43], [29], [38].

The channel model used is AWGN with complex noise variance N0 and a complex time-varying

channel gain h. In the case of unknown channel phase, h is a unit-length time-varying rotation.

The received signal y is given as

yk = hkxk + nk; hk = ejφk , nk : N (0, N0) (4)

From the first moment equation, E[yk] = hkE[xk], where E[xk] is the expectation over the a

posteriori symbol probabilities P (xk) at time k from the inner APP decoder, an instantaneous

channel estimate hk may be found as

hk = ykx∗k (5)

where xk is the mean E[xk] modified to unit modulus, i.e., xk = E[xk]/|E[xk]|. At each iteration,

the inner APP decoder sends soft probability estimates of the channel symbols xk to the channel

estimator, which calculates the instantaneous channel estimates according to Equation 5. These

channel estimates are then filtered through a lowpass digital filter fk, whose bandwidth allows

tracking of the phase noise process, to generate the filtered channel estimates hk = hk ? fk. For a

constant phase offset, hk = (1/N)∑N

k=1hk, where N is the frame length, is simply the average of

the instantaneous channel estimates. For a time-varying Markov phase process such as the random

walk or linear phase processes, the channel estimates are filtered through an exponentially decaying

moving average filter as hk = (1 − α)∑k

j=1αk−jhj , where α, the exponential decay parameter, is

typically close to 1. These filtered channel estimates are used to generate coherent decoder branch

17

metrics P (yk|xk) ∝ exp(

−|yk − hkxk|2)

to be used by the inner APP decoder.

No channel information is yet available during the initial iteration, so hk = 1 is chosen for the

initial branch metrics. As we will show, each iteration improves the a posteriori values P (x) from

the inner decoder, and thus an improved channel estimate h can be recalculated, as long as the

SNR is above the turbo cliff of the coherent decoder.

As mentioned previously, the differential trellis is rotationally invariant to integer multiples of

π/4 rads phase shifts. If, however, in decoding the trellis, the beginning and final trellis states are

assumed fixed to state 0 as is commonly done, endpoint errors will occur with a phase shift. The

rest of the trellis will shift to a rotated sequence, but the endpoints cannot shift and remain fixed,

causing errors. To prevent this, we use a “floating” decoding trellis for the inner APP decoder, with

both beginning and final states assumed unknown and set to uniform probabilities.

Figure 7 illustrates the rotational invariance, with a sample random walk phase process at top

and the final APP phase estimate beneath. Twice, the phase estimate “slips” to a phase rotated by

−π/4 rads from the random walk phase. However, no decoding errors occur at these phase slips;

the rotationally invariant inner trellis ensures there are no decoding errors if the phase estimator

converges to a rotated phase, and the outer decoder cancels the bit errors at the phase jumps, so

the phase estimator seamlessly slips between constellation symmetry angles.

5 Channel Estimation Simulation Results

We now examine system performance in the presence of phase noise, without CSI, making use of

our APP channel estimator. Three different channel phase processes are simulated:

• Constant Phase Offset: φk = φ for φ = π/16 rads

• Random Walk Phase Process: φk = φk−1 + Φk, Φk : N (0, σ2Φ) for σ2

Φ = 0.05 deg2.

18

• Linear Phase Process: φk = φk−1 + 2π∆fT . We consider ∆fT=0.001 and 0.0001.

All channel models include AWGN. Decoding without channel state information is achieved using

our integrated APP channel phase estimation algorithm. All channel phase estimation results use

the same S-random interleaver with minimal spreading S = 3, size 15000 bits, and are compared

with coherent results for the same interleaver. Fifty decoding iterations are used.

Figure 8 compares the BER performance of decoding with and without CSI over an AWGN

channel with a constant channel phase offset of π/16, with near-coherent results. The constant

phase channel estimate h for the entire frame is an average of the instantaneous channel estimates h

as described in Section 4. Performance degrades somewhat as the phase offset approaches π/8. This

is a metastable point, as π/8 is halfway between two valid differential sequences. The instantaneous

channel estimates h will oscillate to either side of the π/8 boundary, and convergence to the correct

phase estimate is very slow for a phase offset of exactly π/8 rads.

APP channel phase estimation results for a random walk phase process with σ2Φ = .05 deg2 are

compared to coherent decoding results in Figure 9. The channel phase estimation filter coefficients

are given by hk = (1 − α)∑k

j=1αk−jhj with exponential decay parameter α = .99. APP channel

estimation for the random walk phase process gives results 0.25 dB from coherent decoding along

the turbo cliff, where the 50 iterations used are not sufficient for convergence.

A constant frequency/linear phase offset may model oscillator drift or a mobile Doppler scenario.

A carrier frequency of 1 GHz and an oscillator drift of 1 ppm from a relatively poor quality oscillator

provide ∆f=1000 Hz. A symbol rate of 106 symbols/sec then corresponds to ∆fT = 10−3. A higher

quality oscillator with drift of 0.1 ppm gives ∆f=100 Hz and ∆fT = 10−4. Alternately, a velocity

of 110 km/hr, a typical highway speed, corresponds also to a Doppler shift of 100 Hz. We consider

both these values of constant frequency/linear phase offset, ∆fT = 10−3 and 10−4, with larger ∆fT

expected to negatively impact performance.

19

Simulation results for a linear phase process with ∆fT = 10−3 and 10−4 are also shown in Figure

9. The APP channel estimation for the linear phase process uses filter parameter α=0.995. Phase

estimation results are approximately 0.6 dB from coherent decoding for ∆fT = 10−3 and near

coherent performance for ∆fT = 10−4.

Figures 8 and 9 go here.

6 Conclusions

We have shown that a simple serially concatenated system combining an outer [3,2,2] parity check

code with an inner differential 8-PSK modulation code offers very good results with iterative de-

coding according to turbo principles. The rotationally invariant property of the inner code aids in

channel phase estimation, and supplies sufficient soft information from the inner APP decoder to

allow initial operation under unknown channel phase rotations.

A simple channel estimation procedure using this soft information from the inner APP decoder

achieves near-coherent performance without channel phase information, under both constant and

time-varying simulated phase processes. Neither pilot symbols nor differential demodulation are

used or needed. The random walk phase process channel estimation results in a turbo cliff shift of

about 0.25 dB to the right, while the linear phase process results in a 0.6 dB shift of the turbo cliff.

Both encoding and decoding portions of this system may be implemented with low complexity,

and could be used in conjunction with packet transmission, where short messages increase the need

for phase offset immunity. Due to a very low MSED, this simple code has a significant error floor.

An improved 8-PSK mapping lowers the error floor at a slight 0.2 dB SNR penalty in the turbo cliff

onset region. Use of a spread interleaver lowers the error floor further over random interleaving.

20

References

[1] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon limit error-correcting codingand decoding: Turbo codes”, Proceedings of the IEEE International Conference on Communi-cations (ICC) 1993, Geneva, Switzerland, 1993, pp. 1064-1070.

[2] C. Berrou, A. Glavieux and P. Thitimajshima, ”Near optimum error correcting coding anddecoding: turbo-codes”, IEEE Trans. Commun., vol. COM-44, no. 10, pp. 1261-1271, Oct.1996.

[3] S. Benedetto and G. Montorsi, ”Unveiling Turbo-Codes: Some Results on Parallel Concate-nated Coding Schemes,” IEEE Trans. Inform. Theory, vol. 43, no. 2, March 1996, pp. 409-428.

[4] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, “Serial concatenation of interleavedcodes: Performance analysis, design, and iterative decoding”, IEEE Trans. Inform. Theory,44(3), May 1998, pp. 909-926.

[5] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, ”Parallel concatenated trellis codedmodulation”, Proc. IEEE International Conference on Communications (ICC) 1996, Dallas,TX, pp. 974-978, June 23-27, 1996.

[6] P. Robertson and T. Worz, ”Bandwidth-Efficient Turbo Trellis-Coded Modulation Using Punc-tured Component Codes”, IEEE Journal of Selected Areas in Communications, vol. JSAC-16,no. 2, pp. 206-218, Feb. 1998.

[7] D. Divsalar and F. Pollara, ”Turbo Trellis Coded Modulation with Iterative Decoding forMobile Satellite Communications”, IMSC 97, June 97.

[8] D. Divsalar and F. Pollara, ”Serial and Hybrid Concatenated Codes with Applications”, Intern.Symposium on Turbo Codes Brest, France, Sept. 3-5, 1997, pp. 80-87.

[9] J.P. Costas, ”Synchronous Communications”, Proc. IRE, vol. 44, Dec. 1956, pp. 1713-1718.

[10] S.A. Butman, J.R. Lesh, ”The Effects of Bandpass Limiters on n-Phase Tracking Systems”,IEEE Trans. Inform. Theory, vol. 25, June 1977, pp. 569-576.

[11] H. Meyr and G. Ascheid, Synchronization in Digital Communications, Vol. 1, Wiley Series inTelecommunications, 1990.

[12] F.M. Gardner, Phaselock Techniques, 2nd edition, John Wiley & Sons, 1979.

[13] J.G. Proakis, Digital Communications, 4th edition, McGraw-Hill, 2000.

[14] E.K. Hall and S.G. Wilson, ”Turbo codes for noncoherent channels”, Proc. GLOBECOM’97,Nov. 1997, pp. 66-70.

[15] D. Divsalar and M.K. Simon, ”Multiple symbol differential detection of MPSK”, IEEE Trans.Commun., vol. 38, pp. 300-308, Mar. 1990.

[16] D. Makrakis and K. Feher, ”Optimal noncoherent detection of PSK signals”, Electron. Lett.,vol. 26, pp. 398-400, Mar. 1990.

[17] M. Peleg and S. Shamai, ”Iterative decoding of coded and interleaved noncoherent multiplesymbol detected DPSK”, Electron. Lett., vol. 33, no. 12, pp. 1018-1020, June 1997.

[18] P. Hoeher and J. Lodge, “’Turbo DPSK’: Iterative differential PSK demodulation and channel

21

decoding,” IEEE Trans. Commun., 47(6), June 1999, pp. 837-843.

[19] M. Peleg, S. Shamai and S. Galan, “Iterative decoding for coded noncoherent MPSK com-munications over phase-noisy AWGN channel”, IEE Proc. Commun., vol. 147, Apr. 2000, pp.87-95.

[20] C. Komninakis and R. Wesel, ”Joint Iterative Channel Estimation and Decoding in Flat Corre-lated Rayleigh Fading”, IEEE J. Sel. Areas Commun., vol. 19, No. 9, Sept. 2001, pp. 1706-1717.

[21] M.C. Valenti and B.D. Woerner, ”Iterative channel estimation and decoding of pilot symbolassisted turbo codes over flat-fading channels,” IEEE J. Sel. Areas Commun., vol. 9, Sept.2001, pp. 1691-1706.

[22] S. Cioni, G. E. Corazza, and A. Vanelli-Coralli, ”Turbo Embedded Estimation with imperfectPhase/Frequency Recovery”, Proceedings of IEEE International Conference on Communica-tions (ICC) 2003, Anchorage, AK, May 2003.

[23] S. Cioni, G. E. Corazza, and A. Vanelli-Coralli, ”Turbo Embedded Estimation for High Or-der Modulation”, Proceedings of the 3rd International Symposium on Turbo Codes & RelatedTopics, Brest, France, Sept. 1-5, 2003, pp. 447-450.

[24] ESA DVB-S2 contribution, ”DVB-S2 Phase Jitter synthesis”, Jan. 2003.

[25] K. R. Narayanan and G. L. Stuber, ”A Serial Concatenation Approach to Iterative Demodu-lation and Decoding”, IEEE Trans. Commun., vol. COM-47, no. 7, pp. 956-961, July 1999.

[26] M. Peleg, I. Sason, S. Shamai and A. Elia, “On interleaved, differentially encoded convolutionalcodes”, IEEE Trans. Inform. Theory, 45(7), Nov. 1999, pp. 2572-2582.

[27] S. ten Brink, “Design of serially concatenated codes based on iterative decoding convergence”,in 2nd International Symposium on Turbo Codes and Related Topics, Brest, France, 2000.

[28] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenated codes”, IEEETrans. Commun. , 49(10), Oct. 2001, pp. 1627-1737.

[29] S. Howard, C. Schlegel, L. Perez, F. Jiang, “Differential Turbo Coded Modulation over Unsyn-chronized Channels”, Proc. of IASTED 3rd Int. Conf. on Wireless and Optical Communications(WOC) 2002, Banff, Alberta, Canada, 2002, pp. 96-101.

[30] S. Howard, ”Probability of Random Interleaver Containing dH=2,4 Two-Branch Error Patternsfor the Differential 8PSK/[3,2,2] Parity SCC”, unpublished report, HCDC Laboratory, Dept.of Electrical and Computer Engineering, University of Alberta, December 2003.

[31] D. Divsalar and F. Pollara, ”Turbo Codes for PCS Applications”, Proc. ICC’95, Seattle, WA,June 1995.

[32] X. Li, A. Chindapol and J.A. Ritcey, ”Bit-Interleaved Coded Modulation With Iterative De-coding and 8PSK Signaling”, IEEE Trans. on Commun., Vol. 50, August 2002, pp. 1250-57.

[33] G. Ungerboeck, ”Channel Coding with Multilevel/Phase Signals”, IEEE Trans. Inform. The-ory, vol. IT-28, Jan. 82, pp. 55-67.

[34] G. Ungerboeck, ”Trellis-coded modulation with redundant signal sets, Part I: Introduction,”IEEE Commun. Mag., vol. 25, No. 2, pp. 5-11, February 1987.

[35] G. Ungerboeck, ”Trellis-coded modulation with redundant signal sets, Part II: State of the

22

art,” IEEE Commun. Mag., vol. 25, No. 2, pp. 12-21, February 1987.

[36] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, “Optimal Decoding of Linear Codes for Mini-mizing Symbol Error Rate”, IEEE Trans. Inform. Theory, vol. 20, Mar. 1974, pp. 284-287.

[37] C. Schlegel, Trellis Coding, IEEE Press, Piscataway, NJ, 1997.

[38] S. Howard, C. Schlegel, “Differentially-Encoded Turbo Coded Modulation with APP ChannelEstimation”, Proceedings of IEEE GlobeCom 2003, San Francisco, CA, December 1-5, 2003.

[39] S. ten Brink, ”Design of Concatenated Coding Schemes based on Iterative Decoding Conver-gence”, Ph.D. dissertation, Universitat Stuttgart, Shaker Verlag, Aachen 2002.

[40] L.C. Perez, J. Seghers and D.J. Costello, Jr., ”A distance spectrum interpretation of turbocodes”, IEEE Trans. on Inform. Theory, vol. 42, pp. 1698-1709, Part I, Nov. 1996.

[41] R.G. Gallager, Information Theory and Reliable Communications, New York, Wiley, 1968.

[42] A. Grant and C. Schlegel, ”Differential turbo space-time coding”, Proc. IEEE InformationTheory Workshop 2001, pp. 120-122, Cairns, Sept. 2001.

[43] C. Schlegel and A. Grant, ”Differential space-time codes”, IEEE Trans. on Inform. Theory,vol. 49, no. 9, Sept. 2003, pp. 2298-2306.

[44] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley Series in Telecommu-nications, John Wiley & Sons, 1991

23

8PSK Natural Map even0 0 1 1 1 1 x0 1 0 1 1 00 1 1 1 0 1 x1 0 0 1 0 0 x1 0 1 0 1 1 x1 1 0 0 1 01 1 1 0 0 1 x

Improved 8PSK Mapping0 0 1 → 1 1 10 1 0 → 0 0 10 1 1 → 1 0 01 0 0 → 0 1 01 0 1 → 0 1 11 1 0 → 1 1 01 1 1 → 1 0 1

8PSK Improved Map even1 1 1 1 0 10 0 1 1 1 01 0 0 0 1 10 1 0 0 1 0 x0 1 1 1 0 01 1 0 0 0 11 0 1 1 1 1

Table 1: Bit sequences of two-branch error events for natural and improved 8-PSK mappings

(000)

(010)

(110)

(100)

(011)

(101) (111)

(001)uv v′

Interleaver[3,2,2]ParityEncoder

DifferentialModulator

D

× x

Figure 1: Serial Turbo Encoder for Differential Turbo Coded Modulation

statetime k − 1 k k + 1

0

1

2

3

4

5

6

7

wk, xk

↖2,2

↖2,41,3

1,52,5

2,7

statetime

7

6

5

4

3

2

1

0

Figure 2: 8-PSK Differential Trellis Section

24

p(y|x)

Inner

D8PSK

APP Soft

Decision

Decoder

Outer

Parity

APP Soft

Decision

Decoder

Pa(w)

PAPP(w|y)

Pa(v′)

PAPP(v′)

λa(v′)

λe(v′)

Symbl→Bit

Bit→LLR

Bit→Symbl

LLR→Bit

π

π−1λa(v)

λe(v)

-

-

PAPP(x|y)

xkyx∗h

Filter

Figure 3: Serial Turbo Decoder, D8PSK Inner Decoder and [3,2,2] Outer Decoder With APPChannel Estimation

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IAinner

, IEouter

IEin

ne

r, IA

ou

ter

outer [3,2,2] parity codeinner D8PSK new map: SNR 3.6 dBinner D8PSK new map: SNR 3.4 dBouter rate 2/3 16 state ccinner D8PSK nat map: SNR 5.0 dB

ConvolutionalCode System

Parity Code System

Figure 4: EXIT chart for outer [3,2,2] parity code and inner differential 8-PSK modulation atSNR=3.4 and 3.6 dB, with improved 8-PSK mapping; decoding trajectory shown for SNR=3.6 dB.EXIT curve for outer rate 2/3 16 state convolutional code and differential 8-PSK at SNR=5 dBalso shown.

25

3 3.5 4 4.5 5 5.5 610

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

D8PSK: natural mappingnat map:180k intlvnew map:180k intlvnat map:15k intlvnew map:15k intlvnew map: 15k S=9 spread intlv

BER: Bit Error Rate

SNR in dB

Eb

N0

Figure 5: Performance of the serially concatenated D8-PSK system with outer [3,2,2] parity code.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IAinner

,IEouter

IEin

ner,I

Aoute

r

D8PSK: 0o offset

D8PSK: 9o offset

D8PSK: 11.25o offset

D8PSK: 15o offset

D8PSK: 22.5o offset

[3,2,2] parity outer code

Figure 6: EXIT chart of the differential 8PSK decoder for various constant phase offsets, SNR=4.5dB, and [3,2,2] parity check decoder.

26

8PSK symbols0 1000 2000 3000 4000 5000

.1

.08

.06

.04

.02

0

-.02

-.04

-.06

-.08

-.1

Phas

ein

Rad

s

estimated phase: red

random walk phase: black

Figure 7: Random walk channel phase model and estimated phase with π/4 phase slips, decodedwithout errors.

coherent decodingAPP channel estimation

3 3.5 4 4.5 5 5.510−7

10−6

10−5

10−4

10−3

10−2

10−1

1BER: Bit Error Rate

Constant Phase Offset π/16 rads

SNR in dB

Eb

N0

Figure 8: BER results without CSI, phase offset=π/16 rads, using APP channel estimation, com-pared with coherent decoding; interleaver size=15k bits, fixed S=3 interleaver

27

3 3.5 4 4.5 5 5.5 610

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

coherent decodingrandom walk phaselinear phase: ∆ f T=10−3

linear phase: ∆ f T=10−4

BER: Bit Error Rate

SNR in dB

Eb

N0

Figure 9: BER results without CSI, random walk and linear phase processes using APP channelestimation compared with coherent decoding; interleaver size=15k bits, fixed S=3 interleaver

28

diﬀerential turbo coded modulation with app channel …

Documents