high-performance syndrome-based sd-bch decoder ...soc.inha.ac.kr/images/year2018volume18_06.pdf ·...

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.6, DECEMBER, 2018 ISSN(Print) 1598-1657 https://doi.org/10.5573/JSTS.2018.18.6.694 ISSN(Online) 2233-4866

Manuscript received Mar. 27, 2018; accepted Aug. 12, 2018 Dept. of Information and Communication Engr. Inha University, Incheon, 22212, Korea E-mail : [email protected]

High-performance Syndrome-based SD-BCH Decoder Architecture using Hard-decision Kernel

Taesung Kim and Hanho Lee

Abstract—This paper proposes a high-performance, low-complexity, soft-decision Bose–Chaudhuri–Hocquenghem (SD-BCH) decoder architecture and its efficient design techniques. The proposed SD-BCH decoder not only uses the test syndrome computation, but also non-iteration processes. The proposed (1020, 990) SD-BCH decoder achieves a 0.75 dB higher coding gain compared to the (1020, 990) hard-decision BCH (HD-BCH) decoder. The proposed SD-BCH decoder was designed and implemented using the 65-nm CMOS technology. The synthesis results show that the proposed SD-BCH decoder architecture with serial structure (P = 1) has 24.7K gate count, which leads to a 69% reduction in hardware complexity compared to the previous SD-BCH decoder architecture. Index Terms—BCH codes, soft-decision decoding, decoder, modified step-by-step algorithm

I. INTRODUCTION

The Bose–Chaudhuri–Hocquenghem (BCH) codes are one of the well-known error-correcting cyclic codes. It is broadly used in communications and storage systems [1-3]. Two types of BCH decoding methods exist: hard-decision BCH (HD-BCH) decoding and soft-decision BCH (SD-BCH) decoding. The most widely used HD-BCH decoding methods are the Berlekamp–Massey (BM) algorithm [1], modified step-by-step (m-SBS)

algorithm [4], and Peterson–Gorenstein–Zirler (PGZ) algorithm [16]. However, an SD-BCH decoding can achieve better error correcting performance compared to HD-BCH decoding [1, 2]. For practical SD-BCH decoder implementations, several SD-BCH decoding methods such as maximum-likelihood decoding (MLD) [8], generalized minimum-distance (GMD) [9] and the Chase algorithm [10] have been proposed. In general, SD-BCH decoding uses the Chase-II algorithm to handle soft-decided codes [5, 11], and some architectures contain the iteration process [11]. From the channel information, some error locations, which are in the received codeword, could be estimated. When an error of not more than d – 1 is added to the encoded codeword, the received codeword closes to the correctable code [12]. Even though a hard-decision kernel, which is designed to correct t-errors (it can only detect or correct (d – 1) / 2), is used for an SD-BCH decoder, the SD-BCH decoder estimates error locations from the reliability value. By using this concept, Jung et al. [5] and Lin et al. [14] introduced an SD-BCH decoding scheme, which is adapted to the Chase-II algorithm. The SD-BCH decoder has 2p candidate codewords, which has a few different Hamming distances, where p is the number of extra correctable bits. Generally, the SD-BCH decoder with hard-decision kernel uses the BM algorithm for hard-decision kernel implementation, resulting in a more complex hardware because the processing element (PE) cannot share any additional information while it is operating. In addition, the adopted BM algorithm architecture requires 2p key equation solver blocks for processing all test patterns. Further, the structure has no parallel structures for key equation solving, and the

latency is increased to 2tp × [13]. To overcome this

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.6, DECEMBER, 2018 695

bottleneck, Jung et al. [5] and Yang et al. [11] proposed an SD-BCH decoder architecture using the algebraic decoding algorithm. In the several soft-decision decoding architectures, a Chase algorithm is adopted with hard-decision kernel. Some soft-decision decoding methods with this algorithm have an iteration process to decide one of the candidate codeword and require a test-pattern generator (TPG). Yang [11] proposed an SD-BCH decoder architecture with probabilistic sorting and a TPG; however, the TPG has high hardware complexity. Jung [5] proposed an SD-BCH decoding architecture using an m-SBS algorithm and a sharing syndrome factor calculator (SSFC) [4]. The SSFC block can share simply the channel information while the decoder is operating. However, the m-SBS algorithm has a disadvantage: If t and p are larger than two, it leads to a dramatic increase in the hardware complexity of the SSFC.

Herein, we propose a novel syndrome-based SD-BCH decoding architecture, which reduces hardware complexity significantly. The proposed SD-BCH decoder not only uses the test syndrome computation, but also non-iteration processes.

The rest of this paper is organized as follows: In section II, SD-BCH decoding algorithm is briefly described. Section III describes the proposed enhanced syndrome-based SD-BCH decoding algorithm and provides the analysis of a bit error rate (BER) simulation. Section IV presents the proposed SD-BCH decoder architecture and design techniques. In section V, the results and a performance comparison are presented. Finally, the conclusions are provided in section VI.

II. SD-BCH DECODING ALGORITHM

In this section, we introduce the fundamentals of a syndrome-based SD-BCH (n, k, t) decoding algorithm, where n and k denote the codeword length and the information length, respectively, and t denotes the error-correction capability. Consider a (n, k, t) HD-BCH code over GF(2m) with length n = 2m – 1, where m ≥ 3. Suppose that c(x) is the transmitted codeword, r(x) = c(x) + e(x) = r0x0+ r1x1 + … + rn-1xn-1 is the received codeword, where e(x) = e0x0 + e1x1 + … + en-1xn-1 is the error pattern. For error correction, the syndromes S(x) = S1x0 + S2x1 + … + S2t-1x2t-2 are calculated from r(x).

The SD-BCH decoding algorithm has been presented

in [5, 11, 14]. Yang et al. [11] summarized the Chase-II algorithm, which is a sub-optimum soft-decision algorithm that uses an error-correction-only hard-decision decoding as the kernel. The procedure for the Chase-II algorithm is described as follows:

1) Select the location of the p least reliable bits (LRBs), where p = [dmin/2], and dmin is the minimum Hamming distance of the codeword.

2) Generate a test pattern by considering all cases of the LRBs. The test pattern is generated from flipping rp(x) = r(x) + xp at the LRBs.

3) Decode the test patterns using the HDD kernel, sequentially. If this test pattern is decoded successfully, the codeword is regarded as a candidate codeword.

4) Evaluate the Euclidean distance for each codeword in the list and choose the one with the smallest Euclidean distance as the best decision codeword. If the best candidate codeword is selected during 2p iterations, the rest of the test patterns are ignored.

The test-syndrome-based SD-BCH decoding algorithm

is introduced in [5]. It is slightly different than the SD-BCH decoding algorithm proposed by Yang [11]. The key of the algorithm is adding the test syndromes to the syndrome, which are combined in the SSFC. Since the test pattern is generated by adding the LRBs to the received codeword, the syndromes of the test pattern are easily computed. To calculate an error location polynomial using test syndromes, we have to consider the combination of LRBs and compute the syndromes for pattern symdrme generators. In [5], the procedure for the conventional SD-BCH decoding using the test syndrome is described as follows:

1) Compute syndromes and p test syndromes from the received vector, where p = [dmin/2], dmin is the minimum Hamming distance of the codeword and the test syndromes are calculated by finding the positions of the LRBs. The test syndromes should be matched with the LRBs to GF(2p) as follows:

St1 = αt1 (test syndrome for 1st location of LRB), St2 = αt2 (test syndrome for 2nd location of LRB), …

Stk = αtk (test syndrome for last location of LRB),

where 0 ≤ t1 <t2 < … < tk ≤ n-1, and n denotes the codeword length.

696 TAESUNG KIM et al : HIGH-PERFORMANCE SYNDROME-BASED SD-BCH DECODER ARCHITECTURE USING …

2) Calculate the coefficients of all test patterns using a modified SSFC simultaneously.

3) Process the Chien search for all test patterns and examine the H values in parallel, which each test pattern has.

4) Decide the best codeword to correct the received codeword and exclude the error vector.

The test-syndrome-based SD-BCH decoding method

has no TPG. All polynomials are processed simultaneously, and it also has no iteration processes and parallelized probabilistic sorting units.

III. PROPOSED ENHANCED SYNDROME-BASED

SD-BCH DECODING ALGORITHM

An HD-BCH (n, k, t) code has t error correctability, where n is the codeword bit and k is the message bit. Otherwise, an SD-BCH code is extended from an HD-BCH code and it is based on the Chase algorithm [10]. An SD-BCH decoder has more p error correctability, which is decided by finding the location of the LRBs; therefore, the total error correctability becomes t + p, where p ≤ dmin/2. Generally, this scheme requires a pattern syndrome generator to decode all test patterns using a hard-decision kernel. With the Chase algorithm, an SD-BCH decoder evaluates the soft-decision metric using the hard-decision kernel. In this section, we propose a syndrome-based soft-decision decoding scheme with non-iterative processing, which does not generate any test patterns to evaluate the soft-decision metric, but calculates all the syndromes regarding the test pattern using the SSFC method. However, this method leads to an increase in operational complexity in a hard-decision kernel, in the case of t > 4. We only consider t and p as three, and the proposed algorithm requires only three test syndromes: αt1, αt2, and αt3, without a test pattern generator. The procedure of the proposed syndrome-based SD-BCH (1020, 990, 3) decoding with non-iterative processing is as follows:

1) Extract the hard-decision value of the log-likelihood ratio (LLR) input bit.

2) Compute the syndromes S1, S3, S5 from syndrome computation (SC), and test syndromes αt1, αt2, αt3 from test syndrome computation (TSC).

3) Generate syndromes Si,tp0, Si,tp1, Si,tp2, Si,tp3, Si,tp4, Si,tp5,

Si,tp6, Si,tp7 of all test patterns in sequence from pattern syndrome computation (PSC), where i = 1, 3, 5.

4) Compute the error polynomials of all test patterns in sequence and the determinant values from the HD-BCH kernel.

5) Examine the H value from the pre-Chien search (pre-CS) and metric check (MC).

6) Decide the error polynomial and search error location from the error polynomial decision (EPD).

7) Generate the error vector and correct error. In addition, the proposed SD-BCH decoding algorithm

is presented in detail as follows:

Proposed Syndorme-based Soft-Decision (1020, 990, 3) BCH Decoding Input : ri={ri,0, ri,1, ri,2, ri,3}, (i = n-1, n-2, ..., 1, 0) Start Soft-Decision BCH Decoding: 1) Separate out Hard-Decision and magnitude

Hard-Decision : rHD,i = ri,0 (i = n-1, n-2, …, 1, 0)

Magnitude : |ri| = mi = {mi,0, mi,1, mi,2} (i = n-1, n-2, …, 1, 0) if (rHD,i == 0)

mi = {ri,1, ri,2, ri,3} else begin

if( {ri,1, ri,2, ri,3}=={1, 1, 1} ) mi = {0, 0, 1} else if ( {ri,1, ri,2, ri,3}=={1, 1, 0} ) mi = {0, 1, 0} else if ( {ri,1, ri,2, ri,3}=={1, 0, 1} ) mi = {0, 1, 1} else if ( {ri,1, ri,2, ri,3}=={1, 0, 0} ) mi = {1, 0, 0} else if ( {ri,1, ri,2, ri,3}=={0, 1, 1} ) mi = {1, 0, 1} else if ( {ri,1, ri,2, ri,3}=={0, 1, 0} ) mi = {1, 1, 0} else mi = {1, 1, 1}

end 2) Syndrome and Test Syndrome Calculator

Syndrome Calculator : for i = n-1 until i = 0

S1 = ∑rHD,i·αi, S3 = ∑rHD,i·α3i, S5 = ∑rHD,i·α5i

Test Syndrome Calculator : Initial: min0 = {1, 1, 1}, min1 = {1, 1, 1}, min2 = {1, 1, 1} for i = n-1 until i = 0 begin

if ( i > 2(n-1) / 3 ) begin if (mi < min0)) min0 = mi , αt0 = αi

end else if ( i > (n-1) / 3 ) begin

if (mi < min1)) min1 = mi, αt1 = αi end else begin

if (mi < min2)) min2 = mi , αt2 = αi end

end

3) Pattern syndrome generator

G={G2, G1, G0}, C={C1, C0} Initial: Si,tp0 = Si, C={0, 0} for j=1 until j=2p-1 begin

C0 = ~C1, C1 = C0 G0 = C0 ^ C1, G1 = C0 & ~C1, G2 = ~C0 & C1 if ({G2, G1, G0}=={0, 0, 1}) Si,tpj = Si,tp(j-1) + (αt0)i


else if({G2, G1, G0}=={0, 1, 0}) Si,tpj = Si,tp(j-1) + (αt1)i else if({G2, G1, G0}=={1, 0, 0}) Si,tpj = Si,tp(j-1) + (αt2)i else Si,tpj = Si,tp(j-1)

end 4) Syndrome Factor Calculator

deg={ deg2,tpj, deg3,tpj} for j=0 until j=2p-1 begin

if(S1,tpj 3 + S3,tpj == 0) begin

Rtpj = S1,tpj 3 + S3,tpj; Atpj = S1,tpj; Btpj = S1,tpj; Ctpj = 0

deg2,tpj = 1 if(S1,tpj (S1,tpj

2S3,tpj + S5,tpj) == 0) deg3,tpj = 0 else deg3,tpj = 1

end else begin

Rtpj = (S1,tpj 3 + S3,tpj) 2 + S1,tpj (S1,tpj

2S3,tpj + S5,tpj) Atpj = S1,tpj

2S3,tpj + S5,tpj; Btpj = S1,tpj 4 + S1,tpj S3,tpj

Ctpj = S1,tpj 3 + S3,tpj

deg2,tpj = 1 if((S1,tpj

3 + S3,tpj) 2 +S1,tpj (S1,tpj 2S3,tpj +S5,tpj) == 0) deg3,tpj = 0

else deg3,tpj = 1 end

end 5) pre-Chien Search and Metric Check

pre-Chien Search : for j=0 until j=2p-1 begin

for i = n-1 until i = 0 begin if ( Rtpj,i = Atpj,i αp + Btpj,i α2p + Ctpj,i α3p ) Mtpj,i = 1 else Mtpj,i = 0

end end Metric Check : for j=0 until j=2p-1 begin Mtpj = ∑Mtpj,i end

6) Test Syndrome Chien Search

for i = n-1 until i = 0 begin

if ( αi + αt0 == 0 ) Hαt0 = 1; else Hαt0 = 0 if ( αi + αt1 == 0 ) Hαt1 = 1; else Hαt1 = 0 if ( αi + αt2 == 0 ) Hαt2 = 1; else Hαt2 = 0

end 7) Error Polynomial Decision and Chien Search

Error Polynomial Decision : for j=0 until j=2p-1 begin

if( ctr == j ) begin Rsel = Rtpj,i, Asel = Atpj,i, Bsel = Btpj,i, Csel = Ctpj,i

,, esel = etpj break

end end Error Polynomial Decision : for i = n-1 until i = 0 begin

if ( Rsel = Asel αp + Bsel α2p + Csel α3p ) di = 1; else di = 0 end Merge Test Syndrome Chien Search : ets0 = 0; ets1 = Hαt0; ets2 = Hαt0 + Hαt1; ets3 = Hαt0; ets4 = Hαt0 + Hαt2 ets5 = Hαt0 + Hαt1 + Hαt2; ets6 = Hαt0 + Hαt2; ets7 = Hαt0

8) Corrected Codeword

Ci = rHD,i + di + esel Output : Ci

First, we calculate the hard-decision rHD,i and magnitude |ri| from the received LLR bits. The hard-decision is a significant bit on the received LLR and is inserted to the syndrome calculator in the hard-decision kernel. A magnitude is used to search the locations of LRBs, which are matched to a test syndrome value by TSC.

To find three minimum values of the LLR magnitude, the codeword is divided into three groups. Each group has a minimum value (min0, min1, and min2), which indicates test syndromes αt0, αt1, and αt2, respectively. The first test syndrome is from { α(n-1)/3, … , α0} in the first group, the next one is from { α2(n-1)/3, … , α(n-1)/3+1}, and the last one is from { α(n-1), … , α2(n-1)/3+1}. The minimum values min0, min1, and min2 are initialized to {1, 1, 1} and compared with an input magnitude |ri|, where i = n-1, … , 0. The minimum value minj is larger than the input value |ri|; TSC updates a sampling value to the input value and samples a test syndrome value αtj, where i = n-1, … , 0 and j = 0, 1, 2.

After computing the syndrome using SC and sampling the locations of LRBs using TSC, the syndromes of the test pattern are generated in sequence using syndromes S1, S3, S5 and test syndromes αt0, αt1, αt2. The selection bit of the test syndrome is generated by a two-bit gray code counter C and a three-bit gray code differential decoder G. They rotate “001,” “010,” “001,” and “100” repeatedly during two cycles. From selecting the test syndrome αt0, αt1, αt2 and adding this value to syndromes S1, S3, S5 using signal G, we can generate the syndrome of all test patterns (pattern syndrome), as follows:

Si,tp0 = Si, (0th pattern syndrome), Si,tp1 = Si,tp0 + (αt0)i (1st pattern syndrome), Si,tp2 = Si,tp1 + (αt1)i (2nd pattern syndrome), Si,tp3 = Si,tp2 + (αt0)i (3rd pattern syndrome), Si,tp4 = Si,tp3 + (αt2)i (4th pattern syndrome), Si,tp5 = Si,tp4 + (αt0)i (5th pattern syndrome), Si,tp6 = Si,tp5 + (αt1)i (6th pattern syndrome), Si,tp7 = Si,tp6 + (αt0)i (7th pattern syndrome).

All syndromes are inserted into the hard-decision kernel using the m-SBS algorithm [4]. The coefficients of error location polynomial Rtpj, Atpj, Btpj and Ctpj are calculated from syndrome values Si,tpj, where j = 0, … , 2p-1, p is the number of LRBs, as follows:


(S1,tpj3+S3,tpj)2+S1,tpj(S1,tpj

2S3,tpj+S5,tpj) = (S1,tpj

2 S3,tpj+S5,tpj)αk + (S1,tpj4+S1,tpjS3,tpj)α2k +

(S1,tpj3+S3,tpj)α3k (1)

Rtpj = Atpj αk + Btpj α2k + Ctpj α3k (2)

where j = 0, … , 7, k = 0, … n – 1. Eqs. (1, 2) are error location polynomials of the third

degree, derived using the SSFC with the pattern syndrome. If the Hamming distance of the error vector is less than three, S1,tpj

3 + S3,tpj is equal to 0 and the polynomial has a degree of two. In this case, the polynomial is changed as follows:

(S1,tpj

3 + S3,tpj) = S1,tpj2 αk + S1,tpj α2k (3)

Rtpj = Atpj αk + Btpj α2k (4)

Information about the degree of a polynomial can be calculated from the syndrome factor calculated as follows:

deg2,tpj = S1,tpj

3 + S3,tpj (5) deg3,tpj = (S1,tpj

3 + S3,tpj) 2+ S1,tpj (S1,tpj 2S3,tpj + S5,tpj) (6)

Eq. (5) is used to decide whether the error polynomial

degree is two. If Eq. (5) is equal to zero, the polynomial has a degree of two, as in Eq. (3). Eq. (6) is always one except that the polynomial degree is one. The information about degree, deg = {deg2,tpj, deg3,tpj } will be three in degree three, two in degree two, and zero in degree one or degree zero. Eqs. (1) or (3) are used to evaluate the M value from the pre-CS and MC. If Rtpj + Atpj αk + Btpj α2k + Ctpj α3k or Rtpj + Atpj αk + Btpj α2k equal to zero, the value M is counted up, otherwise M holds its own value. Therefore, we can examine a number of errors. Those values are used to decide one of the best polynomials for correcting the errors by the controller, which considers the values of both deg and M to detect the violation of the polynomial. The selected polynomial is passed to the EPD, and the locations of the LRBs are restored by the test syndrome Chien search (TSCS). By adding both results from the EPD and TSCS, we can finally obtain the error vectors.

Fig. 1 shows the BER performance comparison for the proposed (1020, 990) SD-BCH decoder and (1020, 990) HD-BCH decoder. The binary phase shift keying (BPSK) and additive white Gaussian noise (AWGN) channel are considered. The proposed SD-BCH decoder achieves a 0.75 dB higher coding gain (BER = 5·10-6) compared to the HD-BCH decoder.

IV. PROPOSED ENHANCED SYNDROME-BASED

SD-BCH DECODER ARCHITECTURE

The proposed SD-BCH decoder architecture has four main blocks: test syndrome computation (TSC), hard-decision kernel, error polynomial decision (EPD), and test syndrome Chien search (TSCS) and controller, as shown in Fig. 2. The hard-decision kernel consists of a syndrome computation (SC), a pattern syndrome generator (PSD), a pre-Chien search (pre-CS), and a metric check (MC).

1. Test Syndrome Computation

The TSC shown in Fig. 3 is one of the key blocks to

process the channel information in the proposed SD-BCH decoder. The proposed TSC has only buffers to the sampling syndrome value associated with the LRB location, while the conventional test pattern generator requires many registers. The TSC requires a control signal ctrsh, and a comparison signal cp to sample test syndromes αt0, αt1, αt2 by comparing the magnitude of the received LLR and the previous LLR value. The first test syndrome αt0 is sampled from |r0| to |rn/3-1|, αt1 is from |rn/3| to |r2n/3-1|, and αt2 is from |r2n/3| to |rn-1|, where n is the length of the codeword. If the previous value is lower than the inserted value, cp is one. Subsequently, the test syndrome is sampled to a syndrome value that indicates an LLR location. If cp is 0, the test syndrome is not sampled. The control signal ctrsh is one when the (n/3-1), (2n/3-1), and (n-1)-th LLR magnitudes are inserted. Subsequently, the sampled syndrome values shift to the

Fig. 1. BER performance for the proposed (1020, 990) SD-BCH decoder, and a (1020, 990) HD-BCH decoder.


αt0, αt1, and αt2 registers.

2. Hard-decision Kernel The hard-decision kernel adopts the HD-BCH

decoding structure based on the m-SBS algorithm [4]. It consists of an SC block, SSFC block, PSG block, pre-CS, and MC block, as shown in Fig. 2.

A. Syndrome Computation

The SC block computes all the syndromes Si (1 ≤ i ≤ 2t-1) from the inserted hard-decision bits. To calculate all coefficients of the error polynomial, the (1020, 990) HD-BCH decoder based on the m-SBS algorithm requires syndrome Si (i = 1, 3, 5). The SC block consists of a parallel syndrome computation cell to process the parallelized codeword, as shown in Fig. 4(a).

B. Pattern Syndrome Generator Fig. 4(b) shows the PSG block, which generates the

test syndrome factors in sequence. This block consists of the test syndrome selection multiplexer, the GF multiplier for the third and fifth power of the test syndrome, and the feedback circuits to generate the pattern syndrome. The selection signal, sel = {selα2, selα1, sel α0} rotates “001,” “010,” “001,” and “100”, repeatedly.

Once this value rotates twice, the PSG generates all power of test syndrome for calculating the syndrome factor of all test patterns. The pattern syndrome is calculated by adding the syndrome Si (i = 1, 3, 5), from SC block, and the test syndromes (αt0) i, (αt1) i, (αt2) i (i = 1, 3, 5), from the TSC block. C. Sharing Syndrome Factor Calculator

Fig. 4(c) shows the optimized SSFC block, which computes all syndrome factors in sequence. The syndrome factors are Adeg2, Adeg3, Bdeg2, Bdeg3, Cdeg2, Cdeg3, Rdeg2, and Rdeg3 using the test syndrome values Stp,1, Stp,3, Stp,5 from the PSG. This block is combined by GF multipliers and GF adders. Since the previous work [4] was not effective in its syndrome factor, the proposed SSFC has optimized the syndrome factors to increase the hardware efficiency. Depending on the syndrome factor Rdeg2, the syndrome factors are selected and then fed to the input values of the pre-CS and MC blocks.

D. Pre-Chien Search and Metric Check

Fig. 4(d) shows the detailed pre-CS and MC block.

D 01

10'd0

αn-1

Comp.

D 10

3'b111

D10

ctrsh

1

αt0

αt1

min3

Magnitude| ri |

3

αt2

10

D10 10

D10 10

D10 10

10

10

10'd0

10

cp

Fig. 3. Test syndrome computation (TSC).

SyndromeComputation

S5

10

S1

10

Pattern Syndrome Generator

Stp,1

10

Stp,3

10

10

TestSyndrome

Computation

ReceivedCodeword

(ri)

FIFO

Magnitude(|ri|)

HardDecision

(rHD,i)

S3

1014

4

3 10

αt1

10

10

αt2

Sharing Syndrome

Factor Calculator

Atp

Btp

Ctp

Rtp

DecisionController

andInformation

Loader

Pre-Chien Search

andMetricCheck

MHD2

Mtp72

AHD

BHD

CHD

RHD

Atp7

Btp7

Ctp7

Rtp7

ErrorPolynomial

Decisionand

ChienSearch

ctr 8

CorrectedCodeword

DecisionCodeword

(di)1

1 1

Hard-decision Kernel

D

TestSyndrome

ChienSearch

et

2

Stp,5

αt0

3

10

αt1

10

10

αt2

αt0

Fig. 2. Proposed syndrome-based SD-BCH decoder architecture.


Since the number of test patterns is eight, there are eight-paralleled pre-CS and MC blocks without iteration. From the sharing syndrome calculator, the syndrome factors Ainit, Binit, Cinit, and Rinit are fed to this block. Subsequently, the pre-CS and MC blocks evaluate the number of errors, which are involved in each polynomial, by inserting the root of the equation. If the condition Rinit = Ainitαi + Binitα2i + Cinitα3i is met, a counter of the MC block counts up; otherwise, the counter holds the value. The metric check values MHD, Mtp1, Mtp2, M tp3, M tp4, M tp5, M tp6, M tp7 are used by the controller to decide the proper error polynomial. They can have zero, one, two, or three values depending on the error number.

3. Test Syndrome Chien Search

The TSCS block consists of a GF multiplier, GF

adders, NOR gates, and D-FFs, as shown in Fig. 5. The TSCS block recovers the locations of the LRBs from the test syndrome values. The loop circuit generates a GF element αj, where j = 0, … n-1. If test syndromes αt0, αt1, αt2 are the same as the GF element αj, the locations of the LRBs’ et0, et1, et2 values become one, otherwise they become zero.

4. Error Polynomial Decision and Chien Search

The EPD block consists of multiplexers, the merge test

syndrome Chien search, and the Chien search block, as shown in Fig. 6. The control signal Ctr from the decision controller indicates the proper codeword for error correction. Depending on this signal, error polynomials Rtpi = Atpiαi + Btpiα2i + Ctpiα3i are selected for initializing the Chien search and a merge result of the TSCS block. Once the coefficients are chosen, the Chien search block is fed with these values and generates an error vector.

Bit Stream(r0,…, r61, r62)

p

α0·i

α1·i

α(p-1)·i

0

1

10

D

αp·i

D1

0

10 Syndrome(Si)

10'd0

S1 S1

S3 S3

Bit Stream(r0,…, r61, r62)

S5 S5

p 10

10

10

(a)

D10

S1 Stp,1

Sts,1

10

Stp,1 Stp,110S1

Sts,1

Stp,3 Stp,310S3

Sts,3

Stp,5 Stp,510S5

Sts,5

Test Syndrome Selector

10

10

10

αt0

αt1

αt2

(b)

S1

Cdeg3

S5

(·)2

(·)3

Rdeg3

Rdeg2

Bdeg2

Bdeg3

Adeg2

Adeg3

S3

m

m

m m

m

m

m

m

m

m

(·)2

(c)

D10

Ainit

α(n-1)

D10

Binit

α(n-2)

D10

Cinit

Rinit

α(n-3)

Adone

Bdone

Cdone

Rdone

10

10

D

10

10 10

D 10

2'd0

M22

10

10

10

10

(d)

Fig. 4. Hard-decision kernel (a) syndrome computation, (b) pattern syndrome generator, (c) optimized sharing syndrome factor calculator, (d) pre-Chien search and metric check.

10

D10

αn-1

αt0

αt1

αt2

10

10

10

et1

et2

et0

αn-1

Fig. 5. Test syndrome Chien search (TSCS).


The LRB locations, which are et0, et1, and et2, have to combine for a soft-error decision. The merged TSCS shown in Fig. 6 combines the LRB location for SD. The results become the inputs of the multiplexer and are selected by the ctr signal. After initializing the CS and the decision about the merged TSCS results, the results of the Chien search and merged TSCS are added (XOR) to each other. Subsequently, the EPD block generates an error pattern di, which is added (XOR) to the received bits from the FIFO channel, where i = 0, … , n-1.

5. Decision Controller

The decision controller generates a control signal ctr,

which decides one of the proper polynomials and soft-decided error patterns in the EPD block. The signal is selected by the degree information, degtpi = {deg2tpi, deg3tpi} from the SSFC and metric check value Mtpi from the pre-CS and MC block, where i =0, 1, … , 7. If the degree Information deg is three, the metric check value M has to be three. If deg is two, M must be two. If deg is zero, M can be one or zero. To select a proper polynomial, the aforementioned condition has to be met. If the condition is not met, the polynomial has a violation. The controller checks the violations of the polynomials depending on the degree information and the metric values. The zeroth polynomial has the first priority and the seventh has the last priority.

V. RESULTS AND COMPARISON

The proposed SD-BCH decoder architecture was designed in Verilog-HDL and then simulated to verify its functionality using a test pattern vector, which is generated in the C simulator. After the complete verification of the functionality for all blocks, it was then synthesized using the appropriate time and area constraints. Both simulation and synthesis steps were performed using the SYNOPSIS design tool and 65-nm CMOS technology.

Table 1 shows a summary of the hardware complexity of the proposed (1020, 990) SD-BCH decoder with the parallel factor (P). The proposed decoder has some basic building blocks, which are GF multiplier, GF adder, multiplexer, D flip-flop, and comparator. Increasing the parallel factor (P) results in higher hardware complexity. However, the parallel factor does not affect some of the basic building blocks (e.g., GF multiplier, GF adder, multiplexer, and D flip-flop) in the SSFC and PSC.

Table 2 shows the comparison results between the proposed HD-BCH decoder architecture used in a hard-decision kernel and the previous HD-BCH decoder architecture [4]. These HD-BCH decoders are synthesized by the different CMOS standard cell library. For a fair comparison, we normalize a unit area of the architecture using a unit area of the 2NAND gate. Details of the computation of a normalized throughput are presented in the footnote of Table 2. The proposed HD-BCH decoder architecture has 3K gate counts and the conventional HD-BCH decoder architecture has 3.2K gate counts. The results show that the proposed HD-BCH decoder kernel has less hardware complexity and lower latency, while maintaining a high throughput rate, compared to the conventional HD-BCH decoder [4].

Table 3 shows the implementation results of the the proposed (1020, 990) SD-BCH decoder with several

Table 1. Hardware complexity of the (1020, 990) SD-BCH decoder with parallel factor (P)

SC TSC SSFC PSG pre-CS and MC TSCS EPD

GF mult. P+1 P 6 2 24P P 3P GF add. P+1 - 3 4 24P 3P 8P

Mux 2 6 4 3 32 1 8 D-FF 2 6 - 1 40 1 4

Comp. - P - - - - -

Bsel

ChienSearch

d1

03'

d03'

d13'

d23'

d33'

d43'

d53'

d63'

d7

1 esel

ctr

AHD 3'd03'd13'd23'd33'd43'd53'd63'd7

Atp1Atp2Atp3Atp4Atp5Atp6Atp7

Asel

10

ctr

CHD 3'd03'd13'd23'd33'd43'd53'd63'd7

Ctp1Ctp2Ctp3Ctp4Ctp5Ctp6Ctp7

Csel

10

ctr

BHD 3'd03'd13'd23'd33'd43'd53'd63'd7

Btp1Btp2Btp3Btp4Btp5Btp6Btp7

10

ctr

RHD 3'd03'd13'd23'd33'd43'd53'd63'd7

Rtp1Rtp2Rtp3Rtp4Rtp5Rtp6Rtp7

Rsel

10

ctr

et1 et2et0

MergeTest Syndrome Chien Search

Fig. 6. Error polynomial decision (EPD).


parallel factors (P =1, 2, 4) in comparison with the previous work for the (1020, 990) SD-BCH decoder with P=1 [5]. The (1020, 990) SD-BCH decoder using previous approach [5] was implemented using 65 nm CMOS standard cell libary and compared with proposed works. It can be seen that the proposed SD-BCH decoder outperforms the SD-BCH decoder using previous work [5] in both gate count and efficiency. Compared with the previous decoder with P=1 [5], our work has almost 3.23 times hardware efficiency, and reduces gate count by 69%. Furthermore, as the parallel factor P is increased, the SD-BCH decoders using the proposed approach have much higher efficiency and lower latency. The proposed SD-BCH decoder with P = 4 has a latency of 518 clocks and a throughput of 1,937 Mbps at the maximum clock frequency of 500 MHz.

VI. CONCLUSIONS

This paper presents a high-performance syndrome-based SD-BCH decoder architecture and its efficient design techniques. A hardware friendly syndrome-based SD-BCH decoding algorithm and scheme are proposed and adopted for the SD-BCH decoder architecture. In addition, a novel pattern syndrome generator, a hard-decision kernel, and an error polynomial decision block are proposed. The proposed SD-BCH decoder architecture has much better BER performance than an HD-BCH decoder, and significantly less hardware complexity than a conventional SD-BCH decoder architecture. In addition, the results demonstrate that the proposed SD-BCH decoders are especially higher hardware efficiency for high parallel factor (P).

ACKNOWLEDGMENTS

This research was supported by the Basic Science Research Program through the NRF, funded by the MSIT(Ministry of Science, ICT) (2016R1A2B4015421), and in part by the MSIT, Korea, under the ITRC support program (IITP-2018-2014-1-00729) supervised by the IITP.

REFERENCES

[1] S. Lin and D. Costello, Error control coding: Fundamentals and Applications, Englewood Cliffs, NJ: Prentice-Hall, 1983.

[2] T. K. Moon, Error correction coding: Mathematical methods and algorithms, John Wiley & Sons, 2005.

[3] K. Lee, H.-G. Kang, J.-I. Park, H. Lee, “A high-speed low-complexity concatenated BCH decoder architecture for 100Gb/s optical communications,” Jour. of Signal Processing Systems, vol. 6, no. 1, pp. 43-55, Jan. 2012.

[4] J. Y. Yeon, S. J. Yang, C. H. Kim, H. Lee, “Low-Complexity Triple-Error-Correcting Parallel BCH Decoder,” Jour. of Semiconductor and Science Technology, vol.13, no. 5, pp. 465-472, Oct. 2013.

[5] B. S. Jung, T. S. Kim, H. Lee, “Low-Complexity Non-Iterative Soft-Decision BCH Decoder Architecture for WBAN Applications,” Jour. of Semiconductor and Science Technology, vol. 16,

Table 2. Implementation result of (1020, 990) HD-BCH decoder for hard-decision kernel

Proposed HD-BCH

PREVIOUS HD-BCH [4]

CMOS tech. 65 nm 90 nm Gate count (NAND) 3,024 3,275

Latency (cycles) 255 + 1 258 Max. clock freq. (MHz) 700 400

Throughput (Gbps) 2.8 1.6 Normalized throughput

(Gbps) † 2.8 2.22

† Normalized throughput (in 65 nm) = throughput × (technology / 65 nm)

Table 3. Implementation results of the (1020, 990) SD-BCH decoders

Proposed SD-BCH Previous SD-BCH [5]

Designs @500MHz

(2 ns) P = l P = 2 P = 4 P = 1 CMOS tech. 65 nm 65 nm 65 nm 65 nm

Total gate count (NAND)* 24,692 25,850 28,078 79,773

Latency (cycles) 2,048 1,028 518 2,048

Max. clock freq. (MHz) 500 500 500 500

Throughput (Mbps)† 484 968 1,937 484

Efficiency (Mbps/ K gate)

19.60 37.45 68.99 6.07

* Gate count is calculated by normalizing a unit area using 2NAND gate, unit gate area in TSMC 65-nm CMOS library.

† Throughput is calculated by P·f·R, where P is level of parallelism = number of bits in one clock cycle, f = operation clock speed, and R is code rate k/n, where k is a length of information bit and n is a length of codeword.


no. 4, pp. 488-496, Aug. 2016. [6] E. Berlekamp, “Algebraic Coding Theory,” World

Scientific, Aegean Park Press, 1984. [7] H. O. Burton, “Inversionless Decoding of Binary

BCH codes,” IEEE. Trans. on Info. Theory, vol. IT-17, no. 4, July 1971.

[8] A. Vardy, Y. Be’ery, “Maximum-likelihood soft decision decoding of BCH codes,” IEEE Trans. on Info. Theory, vol. 40, no. 2, pp. 546-554, Mar. 1994.

[9] N. Kamiya, “On Algebraic Soft-Decision Decoding Algorithms for BCH Codes,” IEEE Trans. on Info. Theory, vol. 47, no. 1, Jan. 2001.

[10] D. Chase, “A class of algorithm for decoding block codes with channel measurement information,” IEEE Trans. on Info. Theory, vol. 18, no. 1, pp. 170-182, Jan. 1972.

[11] C.-H. Yang, T.-Y. Huang, M.-R. Li, Y.-L. Ueng, “A 5.4uW Soft-Decision BCH Decoder for Wireless Body Area Networks,” IEEE Trans. on Circuits and Systems, vol. 61, no. 9, pp. 2721-2729, Sep. 2014.

[12] D. Chase, “A Class of Algorithm for Decoding Block Code with Channel Measurement Information,” IEEE Trans. on Info. Theory, vol. IT-18, no. 1, pp. 170-182, Jan. 1972.

[13] W. Liu, J. Rho, W. Sung, “Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories,” IEEE Wroshop on Signal Processing System Design and Implementation 2006 (SIP`06), vol., no., pp. 303-308, Oct. 2006.

[14] Y.-M. Lin, H.-C. Chang, C.-Y. Lee, “An Improved Soft BCH Decoder with One Extra Error Compensation,” 2010 IEEE International Symposium on Circuits and Systems (ISCAS2010), May 2010.

[15] C. L. CHR, S. L. SU, S. W. WU, “A Low-Complexity Step-by-Step Decoding Algorithm for Binary BCH codes,” IEICE Trans. Fundamentals, vol. E88-A, no. 1, pp. 359-365, Jan. 2005.

[16] X. Zhang, Z. Wang, “A Low-Complexity Three- Error-Correcting BCH Decoder for Optical Transport Network,” IEEE Trans. on Circuits and Systems, vol. 59, no. 10, pp. 663-667, Oct. 2012.

Taesung Kim received the B.S degree in Information and Communi-cation Engineering in 2015, from Anyang University, Anyang, Korea. He is currently working toward the M.S degree in Inha University, Incheon, Korea. His research interests

are digital VLSI architecture design for error correction coding.

Hanho Lee received the Ph.D. and M.S. degrees, both in Electrical and Computer Engineering, from the University of Minnesota, Minnea-polis, USA, in 2000 and 1996, respectively. From April 2000 to August 2002, he was a Member of

Technical Staff at the Lucent Technologies (Bell Labs Innovations), Allentown, USA. From August 2002 to August 2004, he was an Assistant Professor at the Department of Electrical and Computer Engineering, University of Connecticut, USA. Since August 2004, he has been with the Department of Information and Communication Engineering, Inha University, where he is currently a Professor. From August 2010 to August 2011, he was a visiting scholar at Bell Labs, Alcatel–Lucent, Murray Hill, NJ, USA. His research interests include VLSI architecture design for forward error correction, cryptographic, and communications.

high-performance syndrome-based sd-bch decoder ...soc.inha.ac.kr/images/year2018volume18_06.pdf ·...

Documents