An introduction to BCH codes


I’ve been working with error correcting codes, and I had a hard time finding good, consolidated sources online which explained the background at a level of detail that I was comfortable with. Things were either very high level, which didn’t give enough details to implement a real solution, or overly theoretical, which doesn’t appeal to anyone not accustomed to the theorem, proof, lemma, proof style of graduate math textbooks. What I hope to do here is give an explanation of cyclic error correcting codes and BCH codes, aimed at engineers and applied scientists who want to understand an implementation and get a flavor of the theory, without going into proofs. For this reason, I’ll just state results without proving them. Instead, I will try to give an outline of how to implement each part of the process. Those interested in more complete picture are encouraged refer to Costello and Lin.

This blog also assumes familiarity with regular linear error correcting codes (like Hamming codes), so I won’t go over any general background on error correcting codes.

The blog will be divided into the following sections:

  1. Background on finite fields
  2. Introduction to binary cyclic linear error correcting codes
  3. Introduction to BCH codes
  4. Encoding BCH codes
  5. Decoding BCH codes

Finite fields

The mathematics of cyclic error correcting codes is based on finite fields, so it’s impossible to have a meaningful discussion of cyclic codes without first giving the necessary background on finite fields. We’ll try to cover the minimum necessary material to understand future sections, with examples along the way to aid understanding. Let’s get started!

Finite fields, or Galois fields, written , are fields (i.e. they are closed under, and have inverses for, addition and multiplication) which have a finite number of elements. The simplest examples are the integers modulo a prime – is the field with addition and multiplication modulo 2, is the field with addition and multiplication modulo 3, etc. To aid understanding, let’s write out the addition and multiplication table for . When adding or multiplying, everything is modulo 5, so 4 + 4 = 8 mod 5 = 3, and 4 * 3 = 12 mod 5 = 2 for example.

The addition table is:

+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3

The multiplication table is:

x 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1

Galois fields must have a number of elements (order) equal to a prime power, and all Galois fields with the same order are isomorphic to each other. To construct a Galois field with non-prime number of elements (i.e. a prime power number of elements), or an extension field, we use polynomials instead of integers to represent field elements. To construct the extension field from the base field , we use modulo , written . is pronounced “GF(q) adjoin x”, and represents the set of all polynomials whose coefficients are in , and is an irreducible polynomial over of degree m. Irreducible polynomials are polynomials which cannot be factored. For example, in , is irreducible, but is not, since , but 2=0, so . The irreducible polynomials of the first few orders for are:

Order Irreducible Polynomial
2
3 ,
4 , ,
5 , , , , ,

There are, in general, multiple irreducible polynomials of each degree for each base field. In practice, we choose the one that has the smallest weight (so we prefer to use instead of , for example). Since all finite fields of an order are isomorphic, constructing an extension field of a particular order with any valid irreducible polynomial creates an isomorphic finite field (the addition and multiplication tables will just have different entries). Let’s do an example of constructing an extension field from the base field . We choose , which is an irreducible polynomial of degree 2 in . is then the set of all polynomials with coefficients in modulo , or the set , which has 9 elements. Addition and multiplication using this polynomial representation is modulo and modulo 3. Again, for understanding, let’s write out the multiplication table (the addition table is easier to calculate) for and work out some specific examples.

The multiplication table is:

x

Here are some randomly selected worked out examples so you can get a feel for modular arithmetic on polynomials. For each of these, remember that and , so adding or is the same as adding 0. I also want to reduce to 0 to get rid of terms, since the result of an operation should always be a member of the field (closure) and the elements in the field only go up to .





To someone who has done regular boring arithmetic their whole life, modular arithmetic can seem strange and unwieldy at first, but you may find it more enjoyable once you get the hang of it!

A primitive element, , of a Galois field, is an element whose successive powers generate all elements of the field except 0. In other words, it is a primitive (n-1)th root of unity, where n is the number of the elements in the field. Finding primitive elements is a matter of brute force: try all elements other than 0 and 1. This is easier than it sounds though, since about a third of the elements in a finite field are primitive elements. For example, is a primitive element of and , but not . is a primitive element of . To see why, let’s do the multiplication, which is made easy by jumping to the correct cell in the multiplication table above.









We’ve been using the base field in this section to give an example with field order higher than 2. In the remainder of this blog though, we’ll use as the base field in examples, since ultimately we’re interested in binary codes, for which is always the base field.

There is a unique polynomial over for each element in which is the smallest polynomial possible, and such that , called a minimal polynomial. A primitive polynomial is the minimal polynomial of a primitive element. As an example, let’s construct the finite field , or . We use . Some examples of elements are: 0, 1, , , , , , etc. For a primitive element , we can generate each element of by successive multiplication by , as shown below. We use instead of as the symbol for finite field polynomial terms because we will start reserving for minimal polynomials.





(since we’re in modulo 2 and modulo )




…and so forth, until

Each has a minimal polynomial associated with it, and each minimal polynomial has at least one as a root. They can be found through the fact that if is a root of a minimal polynomial over the base field , then so is . So for our example using , , , , , all share the same minimal polynomial, and so do , , , , (17 = 48 mod 31). To find the minimal polynomial for , , , , , we can calculate and simplify. All the ’s should cancel out and we should be left with a polynomial in only. We’ll go over a more explicit example where minimal polynomials are computed in Section 3.

Linear binary cyclic codes

Next, let’s discuss linear binary cyclic codes.

  • Linear: The weighted sum of any two codewords is also a codeword

  • Binary: The codeword symbols are 0 and 1

  • Cyclic: Any cyclic shift of a codeword is another codeword

For example, all sets of valid 4-length linear binary cyclic codes are:

  • {0000, 1111}
  • {0000, 0101, 1010, 1111}
  • {0000, 0110, 0011, 1100, 0001, 0010, 0100, 1000, 0101, 1010, 1001, 1111, 0111, 1110, 1011, 1101}

Binary linear cyclic codes are represented by . In this representation, multiplying by amounts to a cyclic left-shift by 1, since . Extension fields of naturally work to represent sequences of binary numbers, since their coefficients are either 0 or 1, so the resulting polynomials can be thought of as binary messages with the most significant bit (MSB) at either the highest degree of or lowest degree of . Binary modular arithmetic and cyclic shifting is also easier to implement in hardware circuits, with exclusive-OR gates and shift registers.

Furthermore, this means that ANY polynomial in multiplied by a valid codeword is another valid codeword, since valid codewords are some subset of , and multiplication by an arbitrary polynomial is a linear combination of cyclic shifts of the original codeword. Since all shifted codewords are valid codewords and all linear combinations of codewords are valid codewords, the resulting polynomial must be a valid codeword.

The question then, is how do we generate codewords from messages? I.e. how do we choose a generator polynomial which maps all valid messages to all valid codewords? Furthermore, how do we DESIGN the generator polynomial so that we get the block length, message length and error correction capability that we want?

For a cyclic code C(n,k)

  • The generator polynomial is monic (its highest power has coefficient 1)
  • C consists of all multiples of with polynomials of degree k-1 or less
  • is a factor of

The third point is the most important, and means that we are in the business of factoring . We want the prime factors of in , and is the polynomial with the smallest degree which generates the codeword polynomials we want. So if we have codewords of degree n-1 and messages of degree k-1, then must have degree n-k.

For example, all possible cyclic codeword polynomials of length three reside in . factors into , so the possible generator polynomials are:

(degree 0)
(degree 1)
(degree 2)
(degree 3)

  • for , the degree of the message is 2. This is the trivial identity mapping, which maps all codewords of length 3 to themselves.
  • for , the degree of the message is 1. This is a mapping between messages of length 2 and codewords of length 3
  • for , the degree of the message is 0. This is a mapping between messages of length 1 and codewords of length 3. There are only two messages of length 1: 0, and 1, and only two codewords: 000, 111, so this is a repetition code!
  • for , because of the construction of the extension field, , so everything gets trivially mapped to 0.

The beauty and usefulness of creating codes in this way is that decoding a received code has the following properties:

For a received codeword ,

where is the true message, and is the error. Since we know which we constructed with, is 0 at any of those ’s. So

furthermore,

mod = mod , since mod = 0. These two facts help us locate and correct errors.

BCH Codes

BCH codes give an easy and useful way of defining given a blocklength and a number of error correction bits. Recall that, in order to find valid generator polynomials, we are looking for factors of . To link the factorization of with finite fields, we use the fact that the roots of are the non-zero elements of !

In other words, the roots of form a finite field of order q if 0 is added. So for elements of ,

We have already seen that some of the factors of combine to form prime factors, and this is true here as well. Each element of is the root of EXACTLY ONE of the prime factors of . Therefore, must be the least common multiple (LCM) of some of the minimal polynomials of . Which ones depends on the code we are trying to create. The key concept of primitive, narrow-sense BCH codes is that, for t bits of desired error correcting capability, the generator polynomial is the lowest degree polynomial with as its roots.

The steps to construct a BCH code are:

  1. For a given blocklength with , construct a finite field
  2. Find the minimal polynomials of the elements of
  3. The generator polynomial is , where t is the number of error correcting bits desired. The Hamming distance is d = 2t+1
  4. The message length is k = n - (degree() + 1). This creates an (n, k, d) BCH code.

Let’s do an example for a code of blocklength . This requires us to construct the extension field , which we can do with the irreducible polynomial . So our extension field is . The field elements are:

element polynomial binary
0000
0001
0010
0100
1000
0011
0110
1100
1011
0101
1010
0111
1110
1111
1101
1001

(note, )

The first minimal polynomial is associated with , so

similarly, is associated with (24-15=9), so

and so forth:

So the minimal polynomials are

element minimal polynomial

A valid is found by choosing t and computing . For example:

  • For t=1, we get , and has degree 4. For block length 15, this gives a (15, 11, 3) BCH code with one bit of error correcting capability.
  • For t=2, we get , and has degree 8. This gives a (15, 7, 5) BCH code with two bits of error correcting capability.
  • For t=3, we get , and has degree 10. This gives a (15, 5, 7) BCH code, and corrects 3 errors.
  • For t=4, we get , and has degree 14. This gives a (15, 1) repetition code with Hamming distance 15 and corrects 7 errors.
  • If we add the final factor, then we get as the generator polynomial. Since , this generator polynomial maps everything to 0, which is a trivial mapping.
  • Also, we could have used just 1 as the generator polynomial, which maps everything to itself (i.e. a (15, 15, 0) code), which is another trivial mapping.

Codes constructed this way are called primitive (i.e. only block lengths of are used) narrow-sense (i.e. we start with instead of another ) BCH codes.

Encoding BCH codes

BCH codes can be constructed non-systematically, by simply multiplying a message polynomial by the generator polynomial, or systematically, by embedding the message itself in the codeword. The only requirement is that the transmitted codeword is divisible by . In the first case, we have

In the second case, we have

, where mod

so, gets shifted to the MSB of , and then the leftover bits get modified to make divisible by .

Non-systematic encoding is trivial to implement: just multiply the message by the generator and simplify. To implement systematic encoding, we shift the message polynomial so that its MSB is at the MSB of the codeword, and then modify the zeros at the LSB of the codeword (of length n-k) in a way so that the resulting codeword is divisible by . Practically this looks like the following:

For a codeword represented by a polynomial of degree n, and message represented by a polynomial of degree k, the generator polynomial is of degree n-k, so:

  1. We multiply the message polynomial by to shift it to MSB position in the codeword polynomial
  2. must be divisible by to be a valid codeword, so it can be written as for some to be determined
  3. The difference between and is the remainder , which we can compute by taking and finding the modulo with
  4. The encoded codeword is then (+ and - are the same in modulo 2 arithmetic). Since is guaranteed to have degree less than n-k, it doesn’t interfere with our k-degree message polynomial

So we need to find mod , and set those bits in the codeword after shifting the message to th MSB.

Let’s do an example of encoding a message using non-systematic and systematic encoding, using the (15, 11, 3) code that we computed previously. This code has . Our message can be any 11-length binary number, or 10th degree polynomial. We’ll choose 10100010001 as our message, or .

For non-systematic encoding, we simply multiply with , to get , or 101111000100011 in binary.

For systematic encoding, we start by multiplying by to obtain . We then use polynomial division mod 2 to find the remainder after dividing by , which happens to be zero (I never said I would choose a hard example – you can check that ). So the systematically encoded codeword is 101000100010000, which indeed contains the original message verbatim in the MSB of the codeword.

Decoding BCH codes

Decoding BCH codes is more involved than encoding them, but ultimately follows a procedure which is tedious but sane, and easy to implement in both software and hardware. We start with the received polynomial :

where is the transmitted codeword, and is the error. The error for binary codes can be written

where } is the set of error locations. The sydromes are just evaluated at for . An easier pen-and-paper way to evaluate the syndromes is to use the fact that, for a particular , if , the minimal polynomial for is known, then

for some , and where is mod . Since , we can write

For example, for a (15,7,5) BCH code, if r(x) = 000000100000001 or , then since t=2, we need to compute four syndromes, , , , and . We can do this by finding mod for .

  • For the first syndrome, . mod .
  • For the second syndrome, , so
  • For the third syndrome, mod .
  • For the fourth syndrome, , so

So

Since , the syndromes can also be written





or alternatively,





Solving for the unknowns is the goal of any algorithm for decoding BCH codes. However, there are in general possible solutions, so, if the number of errors is less than the designed error correcting capability t, then the solution which yields the smallest number of errors is the correct solution. This corresponds to the solution with the smallest .

The syndrome-error equations can be simplified by defining

after which the syndrome-error equations can be written





These are power sum symmetric functions. Using the ’s, we can define the error locator polynomial as



The roots of are , , …, , which are the inverses of the error locations. This leap of defining an additional polynomial might seem arbitrary, but it’s for a good reason. If is expanded out, the relationship between its coefficients and the ’s are in the form of elementary symmetric polynomials:






From the theory of elementary symmetric polynomials (but feel free to check that this holds for the first few equations), we know that the syndromes and the coefficients of must obey Newton’s identities:








So defining in this way links the error location values with the syndromes via Newton’s identities. Our objective is to determine the error locator polynomial’s coefficients, after which, the roots of the error locator polynomial can be solved for. There may be many such ’s, but we want the one with minimal degree. The process for decoding the error has three steps:

  1. Compute the syndromes by evaluating at for
  2. Using the syndromes, compute the error locator polynomial , whose roots are the inverse of the locations of the errors
  3. For binary codes, once the error locations are known, those bits just need to be flipped in the code. A practical consideration is that bits are indexed from 1 to n instead of 0 to n-1, so an error at is an error at the least significant bit, and an error at is an error at the most significant bit.

In this section, we’ll cover one particular algorithm for decoding BCH codes. Berlekamp’s algorithm gives an iterative procedure for solving for the with smallest degree that satisfies Newton’s identities. The algorithm does this by constructing trial functions which satisfy the first Newton’s identities one by one until is reached, which is minimal degree polynomial that satisfies the first 2t Newton’s identities. Once is known, all that remains to be done is to find its roots by exhaustively substituting all n-1 values of to see which ones result in . The error locations are then the inverses of those ’s. Remember, in a finite field the inverse of is simply , where n+1 is the order of the field.

Let’s go into detail about the procedure. First, let

be the constructed on the th iterative step, whose coefficients produce a polynomial that satisfies the first Newton’s identifies. is the degree of . To determine , we must make a correction to to make sure it satisfies the first Newton’s identities. We do this by computing the discrepancy:

if , we simply set . If is nonzero, we need to look to a previous iteration, indexed at , where and is largest. With the from that iteration, set

which gives the minimal polynomial which satisfies the first Newton’s identities. This step will definitely be confusing for any reasonable person, but hopefully it will make sense when we do an example. Also, so far the whole procedure has been presented abstractly, but the thing to remember is that everything, from the coefficients of to the discrepancies d to the error locator values , are all elements of the finite field. So they’re all an (including 1) or 0.

The Berlekamp procedure is best summarized with a table. Regardless of the problem, the default starting table is

-1 - 1 1 0 -1
0 - 1 0 0

Rows are filled in one at a time until row 2t, and we start on row 0. Assuming we’ve filled up to the th row, filling the st row is done as follows:

  1. Compute for the row
  2. If , then and
  3. If , then find another row prior to (but not including) the th row, row , such that and is the largest value. Then update via

    and – but this equation isn’t needed in practice though: is always the degree of , which can be easily read off.

All of this sounds super complicated, but it really isn’t: it’s just tedious. Let’s do an example to see how it works in practice. Let’s say that we are using a (15,5) triple error correcting code, and the transmitted codeword v = 000000000000000 is received as r = 001000000101000, or . The first step is to compute the syndromes. There are six syndromes to compute since we are using a triple error correcting code. To follow the computations, the reader may find it useful to refer to the tables for that we compiled in an earlier section.

We need to compute syndromes for to . The minimal polynomial for , , and is . The minimal polynomial for and is . The minimal polynomial for is . By pen and paper, the easiest way to compute the syndromes is to compute the remainder of after dividing by , and evaluating the remainder at . Let’s do this for each :

So:

We’ll write down the final table first, and then dissect the results row by row:

-1 - 1 1 0 -1
0 - 1 0 0
1 1 0 1 0
2 1 1 1
3 0 2 1
4 1 2 2
5 0 3 2
6 - - -

We start at . Since our discrepancy (by default) is , which is not zero, to compute , we need to choose a previous row where , with the largest . Since we are not allowed to choose our current row, there is only one choice: . Using this row, we have

This allows us to move to the row. At this row, since the degree of is 1, and so . Now we need to compute the discrepancy for this row, which is done by computing

Here, , , , and , so . Since the discrepancy is zero, we’re free to move onto the next row by setting

so , and , and . Again, we need to compute the discrepancy:

Since is not zero, we need to select the previous column with largest . We have two choices this time where : and , of which has the largest . Using this row, we are able to compute :




On this row, and . The discrepancy is

Since , we are free to move onto with the same :

On this row, and . The discrepancy is

Since is nonzero, we have to again choose a previous row. The options available with are , , and . Of these, has the largest of 1. With this ,





(where we used the fact that and that )

At row 5, and . The discrepancy at this row is





Since , . Furthermore, since we’ve reached row 6, which is the final row, we have found . So the error locator polynomial is

Now all that remains is to find the roots of . We can do this by exhaustively trying all values of . We will do the first few to get a taste of the arithmetic, and then state the other results.

Similarly, , and , so , , and are the roots of the error locator polynomial. Note that the Galois field element 1 corresponds to . To the the error locations, we take the inverses of each of these:

So the error locations are at bits 3, 5, and 12, which is where we put the errors in the beginning of the example. Again, note that if one of the roots of had been 1, then the inverse would also be 1, which indicates an error on the 15th, or most significant bit, since bits are indexed from 1 to 15 rather than from 0 to 14. At this stage, if we knew that the message was encoded systematically, we could just read off the 5 most significant bits. If we knew that the message was encoded non-systematically, we would have to divide the codeword by the generator polynomial to recover the message.