ieee paper

Shieh ct al.: Design and Implementation of a DAB Channel Dccoder 5.53

DESIGN AND IMPLEMENTATION OF A DAB CHANNEL DECODER

Ming-Der Shieh', Chien-Ming Wu', Hsiao-Hsing Chou', Min-Hui Chen', and Chia-Liang Liu' Department of Electronic Engineering, National Yunlin University of Science & Technology

'Industrial Technology Research Institute, Computer & Communication Research Laboratories, Taiwan

1

Abstract This paper describes the design of the de-interleaver

and punctured Viterbi decoder for the Eureka-147 DAB system and their corresponding VLSI implementations. We emphasize on how to efficiently handle four DAB transmission modes, time/frequency de-interleaving and path-metric/survivor memory management in our development. Results show that our implementation has the characteristics of modular design, consuming less silicon area, and facilitating the extcnsion for high transmission rate requirement. The core size of the resulting chip implemcntation is 4990x4930 inn2 based on the TSMC 0.6um single-polysilicon-triple-metal CMOS process.

1. Introduction The Digital Audio Broadcasting (DAB) system,

described in the European Eureka-147 standard [I], offers high-quality audio services, supports various data to mobile reception and might replace the traditional radio systems. In recent years, a lot of efforts have been devoted to the development of low-cost, low-power-dissipation products. Basically, two strategies are employed to implement the DAB receiver: the DSP-based architecture [2,3] and the ASIC-based implementation [3-5]. The former has the characteristics of maximum flexibility, ease of use and simple programming, but it can only provide limited processing capability. On the contrary, the ASIC-based implementation has the potentials of supporting real-time symbol decoding and low-cost implementation. This has motivated the development of chip implementation for the DAB receiver.

The DAB system adopts COFDM (Coded Orthogonal Frequency Division Multiplexing) for channel coding and MPEG Layer I1 audio coding for source coding. In this paper, we focus on the design of the DAB channel decoder based on the specification partitioning described in [3] and the functionality defined in [4]. We show our experiences on how to efficiently handle four DAB transmission modes, timelfrequency deinterleaving and path-metriclsurvivor

memory management (in Viterbi decoder). Results show that our implementation has the potentials of consuming less silicon area and power dissipation, and facilitating the extension for high transmission rate requirement.

Figure I depicts the processing units for the DAB receiver, in which the channel decoder is used to deal with the digital demodulation and decoding of the baseband signal. In this paper we focus on the design and implementation of the functional block for service selection, time/frequency de-interleaving, and punctured Viterbi decoding. The detailed block diagram in our design is shown in Figure 2, which is called DEV (DE-interleaver and Viterbi decoder) chip for clarity. Similar to [4], DEV is operated on a 12.288 MHz clock and uses an external DRAM (256Kx4) for time de-interleaving and serving as a buffer for Viterbi decoding. The core size of the resulting chip implementation is about 4990x4930 urn2 based on the COMPASS standard cell library and TSMC (Taiwan Semiconductor Manufacturing Company) 0.6um single- polysilicon-triple-metal CMOS process.

This paper is organized as follows: Section 2 describes the architecture and implementation of the de-interleaver in DEV using an efficient, table-lookup method. Then, a systematic technique for the development of a punctured Viterbi decoder with the potentials of lower chip area and power dissipation is given in Section 3. Section 4 presents the output processor, global controller and the resulting chip implementation in DEV. Finally, Section 5 gives our conclusions.

2. The Deinterleaver Design As described in [I], the DAB broadcast signal is

arranged into a transmission frame, which comprises three channels: the synchronization channel, fast information channel (FIC), and main service channel (MSC). The FIC gives the configuration information of MSC and program related information, and the MSC is to convey the sound information and various data service information.

In DAB system, the permutation of information follows the Eureka- 147 DAB standard, therefore the

Manoscript receivcd June 28, 1999 0099 3063/99 $10.00 1999 IEEE

554 IEEE Transactions on Consumer Elcctroiiics, Vol. 45, No. 3, AUGUST 1999

received signals must be de-interleaved in the receiver before performing Viterbi decoding. According the encoding process in DAB, in our design, we partition the procedure to four main steps: (1) carrier shift, (2) frequency de-interleaving, (3) QPSK mapping, and (4) time de- interleaving. Because the performance of the channel decoder is strongly dependent on the frequency and time de-interleaving processes, we only focus on the implementation of frequency and time de-interleavers in this section.

y/

'-r I I Figure 1: The block diagram of a DAB receiver.

Figure 2: The block diagram in DEV.

2.1 Frequency De-interleaving Design For the frequency de-interleaving, we employ the

table-lookup method to accelerate the operation. For instance, in transmission mode I, there are totally 2048 complex QPSK symbols and only the 1536 symbols in the n(i) range of [256, 17921, excluding 1024, are useful. Therefore, a 1536x11 ROM table, implementing the mapping n(i) + i, will be needed to find the value of inverse n(i) function and then store the incoming symbol in corresponding location i of the SRAM. Noted that the input symbols are stored in a SRAM of 2048x8 bits for 4-bit soft decision on VQ values, where I and Q represent the real and imaginary part of the complex QPSK symbol, respectively.

It will be impractical if different ROM tables are used for four different DAB transmission modes. Therefore, in our development, we derived the relationships between the n(i) functions of the mode I and modes 11-IV as listed in Table I, by analyzing the recursive n(i) functions defined in [ I ] for four DAB transmission modes. As a result, only the ROM table of 1793x11 bits for transmission mode I is needed to complete the re-ordered sequence mapping for four DAB transmission modes. In this way, we can reduce

the hardware overhead and achieve speed advantage. In other words, in order to construct a ROM table which is applicant to other transmission modes, we only build the ROM table containing the value of inverse n(i) function in the range of [0, 17921 in transmission mode I. For other transmission modes, the correct mapping can be achieved by using the equations defined in Table I.

For example, the total number of complex QPSK symbols is 512 in mode I1 and we will select 384 useful symbols in the range of [64, 4481, excluding 256. With Table I, the 64Ih input symbol corresponds to the equivalent II,(i) = [64+384xO] mod 512 = 64, and the content of the ROM table at address 64 is 832. Therefore, the i value of the 641h input symbol in transmission mode I1 is (832 mode 512) = 320, which is the correct value of the inverse n(i) function in transmission mode 11. For the 651h input symbol, the ROM content at the address ([64+384x3] mod 512) is 59. It should be noted that (1) the ROM content at the address [64+384x3] is still 59 because of the periodical property of the constructed ROM table, and (2) the hardware implementation of mod 2k operation can be easily accomplished by ignoring the unnecessary high-order bits. The function 384xC can be implemented as 256xC+128xC, which corresponds to an addition of two left-shifted values of C. Therefore, no multiplication is needed and the hardware overhead for computing the equivalent address is negligible.

Table I: Relationship between Transmission Mode I and Transmission Modes 11-IV.

Rz: II,(i) = [ II~(i)+384*C] mod 512 Mode I1

Mode 111

&: II,(i) = [ IIn(i)+256*C] mod 1024

2.2 Time De-interleaving Design As described in the standard [I], the time interleaving

is only applied for all the sub-channels of the MSC, thus no time de-interleaving is needed for the FIC. After performing frequency de-interleaving, the input symbols are re-ordered based on the i value and stored in SRAM in original order. The next step would be to separate the (I, Q) pair in a complex QSPK symbol for FIC and MSC such that the location of Q is K* distance behind that of I, where K* represents the number of transmitted carriers in the selected transmission mode. This is followed by time de- interleaving for MSC.

When a complex QPSK symbol is retrieved from the SRAM, the useful I andlor Q values will be immediately

Sliieh et al.: Dcsign and Irnplcrnentation o f a DAB Cbannel Decoder

Chl.Ch2 frame 0

55.5

Ch1,ChZ FIc , . . , . .. . . . Chl;Ch2

frame 1 frame 15

written into the corresponding locations in the external DRAM of 256Kx4 bits. Noted that the useful I (Q) will satisfy the following two constraints: (C.1) it is within the range of selected sub-channels, and (C.2) the complex QPSK symbol is in the desired n(i) range. The former is based on the information given from the global controller as discussed in Section 4. The latter can be checked by implementing the n(i) function and then comparing its output value with the desired range.

For the external DRAM management, our strategy is to conceptually partition the 256Kx4 DRAM spaces into two disjointed regions. The upper 9K memory spaces, i.e. addresses 247K-256K, arc allocated for storing the FIC, while others are for MSC. Therefore, in transmission mode I, for the first (I, Q) pair in the OFDM symbol of index 1=2, the values I and Q will be stored at locations 247K and 247K+K*=247K+1536, respectively. For the pth (I, Q) pair, they will be stored at locations 247K+p and 247K+p+1536, respectively. The arrangement of (I, Q) pairs in the OFDM symbol of index 1=3 can be performed in the same manner except that the starting address is 247K+3072. The same arrangement can be applied to other OFDM symbols in four DAB transmission modes and the 9K memory spaces are large enough to accommodate FIC in a transmission frame.

For MSC, in addition to separate the (I, Q) pairs, we need to do time de-interleaving based on the time interleaving relationship defined in [l] . Figure 3 shows the arrangement of the memory spaces for MSC with two selected sub-channels. Basically, there are 16 base addresses to take into account the time interleaving for 16 transmission frames. The based addresses can be calculated from the size of the selected sub-channels in a transmission frame. For simplicity of explanation, it is assumed that only one sub-channel is selected and the corresponding base addresses arc given from the global controller.

In the following, we describe the time de-interleaving process for useful I/Q values in MSC. For time de- interleaving, we use a simple table shown in Table I1 associated with dedicated arithmetic operations to find the corresponding address in DRAM. The content in Table I1 is explained as follows. For the first transmission frame 0, the first I value will be stored in frame 0, the second I value in frame 8, the third in frame 12, and so forth. For another transmission frame q, the first I value is stored in frame [(O+q) mod 161, then the second I value will be located in frame [(8+q) mod 161, etc. The remaining question is that what is the right address in the selected frame to store I value. Noted that the Q value in MSC can be arranged in the same way as in FIC, therefore we only describe how to compute the right address to store I.

For the first transmission frame 0, the first I value will be stored at location 0 in frame 0, the second I value at

location 1 in frame 8, the third at location 3 in frame 12, and so forth. The actual address in DRAM for the location a in the frame b is a+baseb. In general, the location relationship in the each frame can be applied to other transmission frames. According to DAB standard, its impossible to store all the MSC of the 16 transmission frames in just a 256Kx4 DRAM. Because each capacity unit (CU) contains 64 bits, the maximum number of capacity units can be stored in DRAM is 247W(64x16) = 247.

Figure 4 illustrates the simplified block diagram for our de-interleaver design. The main difference between FIC and MSC block is the way to calculate the DRAM address as described above. The DRAM interface is to generate the controlling signals for accessing DRAM. The control circuit block will generate all of the necessary informations including those to decide whether the de-interleaver or the punctured Viterbi decoder can access the DRAM. The timing arrangement guarantee that the old value should be read by punctured Viterbi decoder before the de-interleaver writing the new value into the same DRAM location. The design of the punctured Viterbi decoder is described in the following section.

base0 bass1 base2 baseib 247K

Figure 3: Partition of DRAM for two sub-channels.

Table 11: Time De-interleaving Table

~ 0 ~ 8 ~ 1 2 ~ 4 ~ 1 4 ~ 6 ~ 1 0 ~ 2 / 1 5 ~ 7 / 1 1 ~ 3 ~ 1 3 ~ 5 ~ 9 ~ 1 ~

Frequency deinterleaver

generator

Figure 4: The de-interleaving block diagram of FICIMSC.

3. The Punctured Viterbi Decoder Since either FIC or MSC, including audio and data

scrvice, makes no difference in punctured Viterbi decoder (VD) as long as they can be interpreted in correct format, in

556 IEEE Transactions on Consumer Electronics, Vol. 45, No. 3, AUGUST 1999

the following we use the term data to represent the information to be processed in punctured VD. Because the data has been stored in DRAM in the right order, the punctured VD can easily read the FIC and MSC data of a transmission frame from DRAM.

In DAB system, the channel encoding process is based on punctured convolutional coding, which allows either Equal Error Protection (EEP) or Unequal Error Protection (UEP). Therefore, it will be cost-effective to design a flexible punctured VD, which satisfies the variahle-rate requirement. Figure 5 shows the block diagram of the variable-rate punctured VD. The punctured processor is used to perform necessary zero insertion in the convolutional code such that a conventional (4, 1. 6) VD can be applied to such an application. In the following, we describe the key techniques employed in our design.

From Global - . -. . . . . . - - . . . . - - - . -. - - - . -. . Punctured Processor j Controller

,

: BMU ACSU SMU sequence

Zero insertion vectors ._ -_ ._ _ _ . _ _ _ _ _ _ _ _ . _ _ . . _ _ _ _ _ - - _ - - . ____- . . .______.__ . ._ .

Viterbi decoder

a- .-...._.._____.--..----..--. Figure 5: The block diagram of the variable-rate punctured

Viterbi decoder.

3.1 Punctured Processor

In DAB system, the channel encoding process is based on punctured convolutional coding, which allows either Equal or Unequal Error Protection. The serial mother codeword is split into consecutive blocks of 128 bits and each block is further divided into four consecutive sub- blocks of 32 bits. Each sub-block is punctured based on the defined puncturing vector, therefore some predefined codebits are not transmitted and a variable code ratc between 8/32 and 8/9 is possible. To simplify the design of punctured Viterbi decoder, a punctured processor is designed to de-puncture the incoming data such that a single Viterbi decoder supporting a code rate of 1/4 can be used for different code rate requirements.

Because different protection profiles are applied to FIC and MSC, an easy way is to employ the table-lookup method to find the corresponding protection profile based on either the selected transmission mode for FIC or the selected audio/data bit rate and protection level for MSC.

The question is how to efficiently deal with a variety of protection profiles. The solution is trivial for FIC in different transmission modes, however it is much more complicated for MSC. In our development, we partition the audio service component protection profiles table (Table 33 in [I]) into five correlated small sub-tables based on the value of protection level and implement each sub-table in PLA (programmable logic array) to reduce the hardware requirement instead of using a single ROM table. Similarly, another PLA is used to implement the equal error protection profile table (Table 36 in [ I ] ) without partition.

Figure 6 (a) shows the conceptual block diagram of the PLA implementation for MSC. The input controlling signals come from the global controller, which define the signaling protocol decoded by the global controller described in Section 4. The output signals contain the number of block L and the puncturing index PI of the current status. The output PI value is then converted to the corresponding puncturing vector PV as depicted in Figure 6(b) for either FIC or MSC. The PV will be uscd in the Viterbi decoding for zero masking and the puncturing vector table (Table 31 in [ I ] ) is also implemented in PLA in our design.

I (b)

Figure 6: (a) The conceptual block diagram of punctured processor for MSC, and (b) the block diagram to generate the corresponding FICMSC puncturing vector.

Sliieh et al.: Dcsigii and Irnplcmentation of a DAB Channel Decoder 557

3.2 (4,1,6) Viterbi Decoder The implementation of the VD [ I l l can be generally

divided into three basic units: the branch metric unit (BMU), add-compare-select unit (ACSU), and survivor memory unit (SMU). The BMU is used to compute the branch metrics at each time stage based on the received noisy data and the correct codeword. These values are then fed to the ACS unit to update the path metric of each survivor path. Finally, the SMU is used to store the survivor sequence for each state, perform the trace-back operation, and output the decoded bits. In general, the VD needs either registers or memory to store the path metric of each state. In our development, a path metric memory unit (PMMU) is allocated to store the path metric of each state. And, we show how to efficiently manage the PMMU to reduce the hardware overhead and meet the DAB requirement.

3.2.1 Specifications

are listed as follows: The specifications of the Viterbi decoder in our design

(1) (4, 1, 6) convolutional code (2) generator polynomial: 133, 171, 145, 133 in octal

forms ( 3 ) code rate: R=1/4 (4) memory order: m=6 (constraint length: K=7) (5) the number of states in decoding trellis: N=64 (6 ) receiver quantization levels: Q=16 (7) truncation length: T=40 (8) maximum output rate: 384 kbls (9) system clock frequency: 12.288 MHz

Before describing the design of individual unit in VD, we need to review the so-called butterfly module widely used in VD design. From the trellis diagram of a convolutional code, the decoded function can be efficiently performed by breaking the trellis up into a number of identical elements. For example, the trellis diagram of a rate-lln convolutional code can be broken up into elements containing a pair of origin and destination states and four interconnecting branches. A butterfly module for the (4, 1, 6) convolutional code is shown in Figure 7, which is the basic processing unit in our design.

For the butterfly module in Figure 7, it contains four states Sj,, Sj+32,,, S2,,l+l and S,,,,,,. The only difference between labels of these four states is shown in the shaded regions. Suppose that the current state is Sj,, at the (t)Ih time stage. If thc input bit is 0, then at the (N-1)" time stage, the next state is Szj,,,, and the output branch symbol is bml=(x4x3x2x1). On the other hand, if the input bit is 1, then the next state will be S2j+l,l+l with the output branch symbol bm2= ( ~ 4 ~ 3 ~ 2 ~ 1 ) . The transitions rom state Sjr32.1

can be interpreted in the same way. Based on the butterfly module, we can calculate the branch metric between the received data and the correct codeword. With the path metrics PM,(S,,,) and PMt(Sj+32,J associated with states Sj,, and Sj,,,,, the updated path metric, P M , + I ( S ~ , , ~ + ~ ) and ~M~+I(&~+I,:+I)> of states &j,t+l and Szj+i,,+~ can then be derived. In this way, all the states at the (r)lh time stage will he processed and the new path metric associated with each state is updated. The process is then repeated at the next time stage.

bm2xbm3 j = O , l , ..., 31

bml=bm4=x4x,x,x, bm2=bm3=y4y,y,y,

Figure 7: Butterfly module.

3.2.2 Branch metric unit

The branch metric unit is used to generate the branch metrics. Because the maximum output rate of VD is 384 kh/s in our design, the BMU can spend 12.288M/384K = 32 clock cycles to compute all the branch rnetrics at each time stage. It means that only one butterfly module element is needed to maintain the regular data flow with minimum area requirement for BMU.

The butterfly module has the following characteristics: (1) the relationship bml = bm4 = ( X ~ X ~ X ~ X I ) and bm2 = bm3 = (y4y3y2y1) = ( t , t ,K ,K, ) can be applied to simplify hardware implementation, and (2) the most significant bit (MSB) of a state could be used as a decision hit for traceback operation. Figure 8 shows the block diagram of BMU, in which (r-c)* block is to implement the soft decision tactics between the received noisy symbol r and the ideal noiseless codeword symbol c. The Mask signal is from the punctured processor and it contains the information of the current puncturing vector PV. If Mask = 0, it means that the calculated branch metric should be ignored and is not accumulated in the ACS unit. It should be noted that if the data is punctured before transmission (Mask=O), the DRAM address will not be increased such that the BMU always read the correct data.

3.2.3 Add-compare-select unit

This unit is used to perform the following tasks. (1) Recursively accumulate branch metrics to derive the

558 IEEE Transactions an Consumcr Electronics, Vol. 45, No. 3, AUGUST 1999

corresponding path metric of the survivor path for each state. (2) Find the decision bit used in SMU for traceback operation and store the best state for starting traceback. (3) Maintain the minimum path metric derived at previous time stage for path metric normalization at present time stage. Similar results can be found in the literature, therefore we only shows the resulting implementation in our design.

Mask -

Figure 8: The block diagram of BMU.

Figure 9 shows the block diagram of the ACSU. The corresponding ACU module and the min path metricbest state module are depicted in Figures 10(a) and (b), respectively. In summary, the ACS module is to perform the following equations for j = 0, 1, . . ., 3 1 .

And, the path metric normalization is based on the "variable shift" method [12]. Let the state SB,,, be the best state at time stage I such that Min-PM,(SB,,,) = min{PM,(S,)) for j = 0, 1 , ..., 63. Then, the path metric normalization operation will be accomplished by executing equation (3).

(3) N-PM,,(Sj) = PM,+,(Sj) - Min-PM,

r a I I min path metric / best-state module

I I I I 64 14 14 1" .(

decision vector

Figure 9: The block diagram of ACSU.

Adder Adder

value check

$. PMi+v1(S21) Decision value

N.PM,.,(S,,) BO, sure Min-PM, N-PY.&.I)

(b)

Figure 10: (a) The ACS module and (b) the min path metrichest state module.

3.2.4 Path metric memory unit

In general, the path metrics can be updated in the ping- pong mode or in-place scheduling [6, 81. In the ping-pong fashion, the path metrics at time stage t+l are computed using path metrics at time stage t, therefore it is necessary to double the size of RAMS for path metric memory. One memory provides the previous path metrics to the ACS unit, while the other one stores the new path metrics of the survivor paths, and their roles are swapped at next time stage t+2. The main disadvantage is that two times of path metric memory are required, but the control circuit is easy. For in-place scheduling, only one path metric memory is required for updating path metrics, and each old path metric is immediately overwritten by the newly computed path metric. As a result, its memory size is half of that of the ping-pong mode at the expense of more complicated control circuit for in-place scheduling. For low-area design

Shieh et al.: Design and lmplemcnlation of a DAB Channcl Decoder 5.59

consideration, we adopt the concept of in-place scheduling for lower hardware requirement.

In general, using embedded RAM will take a smaller chip area in VLSI implementation than registers do. The problem is that we might need a dual-port memory to either read two old path metrics or update two new path metrics at the same time such that two ACS modules can perform their tasks in parallel in a single clock cycle. To overcome the problem, we have developed an efficient technique to partition the whole RAM into two disjoint banks for increasing the overall memory bandwidth such that two ACS modules can access the corresponding path metrics concurrently. Some of the resulting implementation can also be found in our previous paper [7]. Figure 11 shows the block diagram of the resulting PMMU implementation. As seen from this figure, the 64x14 SRAM is partitioned into two 32x14 banks. The D-Mux circuit is used to distribute the newly computed path metric values into memory banks and the Mux circuit is to provide the previous path metrics to two ACS modules without conflict.

- .____..____... . . D-Mux

MllX

Unit OiiCPM r

Figure 11: PMMU implementation

3.2.5 Survivor memory unit Basically, the memory organization techniques to store

the survivor sequences can be classified into two classes: the register exchange (RE) method and the traceback (TB) method [9, lo]. We choose the TB method instead of RE method because the RE generally requires more power consumption and larger area in VLSI implementation than TB does.

In practical applications, the survivor memory using TB method is conceptually divided into a write region and a read region. During each decoding phase, the survivor path is traced and decoded from the read region, while new decision vectors (one decision per state) are written to the write region. As long as the time needed for the read operation is the same as that for write operation, a regular data flow can be kept in the survivor memory unit. The read region is further partitioned into a merge block (or traceback read) and a decode block (or decode read). Traceback of the merge block is used to generate an

estimate of the initial state at the beginning of the decode block. In our design, we use the one-point traceback architecture [IO] to implement the SMU of the Viterbi decoder as shown in Figure 12. The block (bank) widths are 40, 20 and 20, respectively, for traceback read (tb), decode read (dc) and writing new data (wr). Therefore, the truncation length T = 40 is satisfied and data can be consistently read and written into the memory.

Time BANK0 BANK1 BANK:! BANK3

ria N-l

0

N-I

Fl rq pq 1 N-l

4 path of the dccodcr poinicr (decodc_rtolc)

+ path cvfthc triicchack pointer (tracehack_stato) Written add-compare-sclcci decisions for N siiitc 1 (OlhroushN-1)

Figure 12: Survivor memory management of the one- pointer method [lo].

Figure 13 shows the resulting implementation with a SRAM of 80x65 bits. The pre-state represents the current best state during traceback operation and it is used to choose one of the 64 decision bits. With the chosen decision bit, the survivor path circuit can find the next best state based on the rotational property of the butterfly module. The output control is applied to re-order the decoded data. The SRAM of 80x65 bits is chosen because that 80 memory locations are required for both write and read regions and each decision vector contains 64 bits (64 states for the target convolutional code). The 65" bit is introduced to identify the traceback boundary when multiple sub-channels are selected in different DAB services. More specifically, when the sub-cbannel change is recognized, the 65Ih hit is set to high and the current best state is stored into a register. During the traceback process, the 65" bit is read and checked. If it is high, it implies that the traceback process is reading the boundary of two selected sub-channels, and the present best state is replaced by the one stored in the register.

560 I I

Decision Y tor -57

TU memory 80x65b

__+

Control

Be9Lslste ~

U

Figure 13: Implementation of the one-point TB method for SMU.

4. Output Processing and Global Controller 4.1 Output Processing

The output processing is the interface between the punctured Viterbi decoder output and the environment. The major functions of output processing include: (1) Perform pseudo-random binary sequence (PRBS) de-scrambling for energy dispersal of data with polynomial P(x)=x9+.r5+l. (2) Generate the window signals for sub-channel selection. (3) Count the number of errors for each received sub-channel. (4) Verify the cyclic redundancy check (CRC) with polynomial G(x)=x'~+x'~+x~+~ on the FIC, which is stored in a SRAM to be accessed by the host (the global controller). (5) Output a synchronized clock associated with the output data.

Figure 14 depicts the block diagram of output processing, in which the Undecoded Data block is used to store parts of the received data read from DRAM. The size of the Undecoded Data block is determined by the decoding latency of the punctured VD. The received data is then compared to the decoded output of punctured Viterbi decoder to find the bit error counts, i.e. the error flag counter (EFC). The Win-Control block is used to control on/off condition of the windows for the selected sub- channels.

4.2 Global Controller In DEV, the operations of the de-interleaver, punctured

Viterbi decoder and output processing are monitored and controlled by the global controller. In the following, we summarize those control signals provided by the global controller. The control signals to the de-interleaver contain (1) the number of CU and starting address of the selected sub-channel, (2) current transmission mode for DAB service, and (3) the information of carrier shift. The control signals to the punctured Viterbi decoder include (1) protection level, (2) coding rate, (3) current transmission mode for DAB service, (4) audio/data indicator & reset signal, and ( 5 ) the number of CU of the selected sub-

3EE Transactions on Consumer Electronics, Vol. 45, No. 3, AUGUST 1999

channel. The control signals to output processing are (1) window control information, (2) SRAM control signals to read the FIC data stored in 96x8 SRAM, and (3) EFC data readout control signals.

Figure 15 shows the simplified block diagram of the global controller. If the DEV controller sends an audio command, the global controller will translate the command to indicate the protection level, bit rate, starting address and the number of CU by the table-lookup method. Because audio information defined in [ l] is irregular, we adopt the table-lookup method to simplify the operation. As a result, a 64x15 ROM table is needed in global controller design. For the data command, the global controller will provide the same indications as those for audio command. The only difference is on the implementation of the translation, which the function is implemented in PLA to reduce the hardware overhead.

Undecoded Error

Difference Flag 1 Viterbi Decoded Data Counter

l*l Output-Data ___f

f or global controller-,

Figure 14: The block diagram of output processing

r& controller 1 ,

Figure 15: Block diagram of the global controller.

The chip implementation of the developed DEV (in Figure 2) is shown in Figure 16. In terms of the 2-input NAND gate, the total number of gate counts is 25582, excluding the used memories. The resulting core size of the chip implementation is 4990x4930 urnz based on the TSMC (Taiwan Semiconductor Manufacturing Company)

Shieh et al.: Dcsign and Implerncntation of a DAB Channel Decodcr

0.6um single-polysilicon-triple-metal (SPTM) CMOS process. The overall chip size including the U 0 pads is about 6736x6700 urn2.

Figure 16: The layout of the developed DEV.

5. Conclusion In this paper, we describe the employed methodologies

toward implementing the Eureka- 147 DAB channel decoder. All the functional blocks are designed, simulated, and verified using the Synopsys and Cadence software and the final layout is ready for VLSI fabrication based on the 0.6 um SPTM process and Compass cell library. Results show that our implementation has the potentials of consuming less silicon area and power dissipation, and facilitating the extension for high transmission rate requirement.

6. Acknowledgments This work was supported in part by the computer &

communication research laboratories under contract G4- 87027-a. The authors want to thanks C. P Hung, H. S. Lin, and Hsin-Fu Lo for many helpful discussions on the DAB specifications and the final implementation.

References ETS 300 401: Radio broadcasting system; Digital Audio Broadcasting (DAB) to mobile, portable and fixed receivers, February 1995. (The 2d version is revised in May 1997) H. Usuba, S. Kakiuchi, and K. Yamauchi, A Prototype DAB Receiver, in Proc. IEEE International Conference on Consumer Electronics, pp. 52-53, 1996.

[ 101 G. Feygin and P. G. Gulak, Architectural Tradeoffs for Survivor Sequence Memory Management in Viterbi Decoder, IEEE Trans. Communications, vol. 41, no. 3, pp. 425-429, March 1993.

[ 111 H. L. Lou, Implementing the Viterbi Algorithm, IEEE Signal Processing Magazine, pp. 42-45, 1995.

[I21 C. B. Shung, G. Ungerboeck, and H. K. Thapar, VLSI Architectures for Metric Normalization in the Viterbi Algorithm, in Proc. GIOBECOM, pp. 1723-1728, 1990.

561

A. Delaruelle, J. Huisken, J.V. Loon, and F. Welten, A Chip Set for a Digital Audio Broadcasting Channel Decoder, in Proc. IEEE Custom Integrated Circuit Conference, pp.13.4.1-13.4.4, 1995. J. A. Huisken, F. V. Lax , A. Delaruelle, and N. 1. L. Philips, Specification, Partitioning and Design of a DAB Channel Decoder, VLSI Signal Processing, vol. VI, pp. 21-29, Oct. 1993. T. Fukami, A. Tanaka, K. Fukunaga, K. Nomura, and S. Kobayashi, On-Chip Baseband Decoder for a DAB Receiver, in Proc. IEEE Custom Integrated Circuit Conference, pp.400-401, 1998. M. Biver , H. Kaeslin, and C. Tommasini, In-Place Updating of Path Metrics in Viterbi Decoders, IEEE J. Solid-State Circuits, vol. 24, pp. 1158-1159, Aug. 1989. M. D. Shieh, M. H. Sheu, C. M. Wu, and W. S. Ju, Efficient Management of In-Place Path Metric Update and its Implementation for Viterbi Decoders in Proc. IEEE International Symposium on Circuits and Systems, pp. IV449-452, 1998. C. M. Rader, Memory management in a Viterbi decoder, IEEE Trans. Communications. vol. 29, pp. 1399-1401, Sept. 1981. R. Cypher and C. B. Shung, Generalized Trace Back Techniques for Survivor Memory Management in the Viterbi Algorithm, GLOBLECOM, pp. 1378-1322, Dec. 1990.

562 IEEE Transactions on Consumer Electronics, Vol. 45, No. 3, AUGUST 1999

Ming-Der Shieh received the B.S. degree in electrical engineering from National Cheng Knng University, Taiwan. in 1984. the M.S. degree in electronic engineering from National Chiao Tung University, Taiwan, in 1986, and the P1i.D. degree in electrical engineering from

Michigan State University, East Lansing, in 1993. From 1988 to 1989, he was an engineer at United

Microelectronic Corporation, Taiwan. He is currently an Associate Professor at Yunlin University of Science & Technology, Taiwan. His research interests include computer-aided design, VLSI for signal processing, VLSI design and testing.

Dr. Chia-Liang Liu received the B.S. and M.S. degrees in Electrical Engineering from National Taiwan University, Taipei, Taiwan, in 1982 and 1984 respectively and Ph.D. degree, also in Electrical Engineering, from the University of California, Davis in 1992. Since April 1993, he has been with the

Cornputer & Communication Research Laboratories, Industrial Technology Research Institute (CCL/ITRI), Taiwan. He is responsible for personal/mobile broadcasting and communications system designs at CCL/ITRI. He is currently a section manager in CCL/ITRI. His research interests include signal processing and digital communications and broadcasting systems.

Chien-Ming Wu received the B S degree in electronic engineenng from National Yunlin University of Science & Technology in 1997. He is Min-Hui Chen received the B.S and currently a master student in institute the M S degrees in electronic of electronic and idormation engineenng from National Ocean engineering at National Yunliii University, Keelung, Taiwan, in 1993 University of Science & Technology, and 1995 From 1995 to now she is

engaged in Eureka-147 DAB (Digital 'A Taiwan His research interests Audio Broadcasting) project in include VLSI design in communication and coding theory,

digital signal processing Computer & Communication ies / Industrial Technology Research

Hsiao-Hsing Chou the Institute, Taiwau Her research interests include personal

Feng-Chia University, in 1997 He is currently a master student in inshtute of electronic and information engineering at National Yunlin Univcrsity of Science & Technology,

I 4 Taiwan. His research interests include mixed-signal circuit design and testing, VLSI design, and VLSI architecture in digital signal processing.

Dr Liu is a member of IEEE

degree in e~ectronlc froin communlcatlon system design and digital signal processing.

ieee paper

Documents