An introduction to BCH codes

I’ve been working with error correcting codes, and I had a hard time finding good, consolidated sources online which explained the background at a level of detail that I was comfortable with. Things were either very high level, which didn’t give enough details to implement a real solution, or overly theoretical, which doesn’t appeal to anyone not accustomed to the theorem, proof, lemma, proof style of graduate math textbooks. What I hope to do here is give an explanation of cyclic error correcting codes and BCH codes, aimed at engineers and applied scientists who want to understand an implementation and get a flavor of the theory, without going into proofs. For this reason, I’ll just state results without proving them. Instead, I will try to give an outline of how to implement each part of the process. Those interested in more complete picture are encouraged refer to Costello and Lin.

This blog also assumes familiarity with regular linear error correcting codes (like Hamming codes), so I won’t go over any general background on error correcting codes.

The blog will be divided into the following sections:

Background on finite fields
Introduction to binary cyclic linear error correcting codes
Introduction to BCH codes
Encoding BCH codes
Decoding BCH codes

Finite fields

The mathematics of cyclic error correcting codes is based on finite fields, so it’s impossible to have a meaningful discussion of cyclic codes without first giving the necessary background on finite fields. We’ll try to cover the minimum necessary material to understand future sections, with examples along the way to aid understanding. Let’s get started!

Finite fields, or Galois fields, written $GF(q)$ , are fields (i.e. they are closed under, and have inverses for, addition and multiplication) which have a finite number of elements. The simplest examples are the integers modulo a prime – $GF(2)$ is the field $\{0,1\}$ with addition and multiplication modulo 2, $GF(3)$ is the field $\{0, 1, 2\}$ with addition and multiplication modulo 3, etc. To aid understanding, let’s write out the addition and multiplication table for $GF(5)$ . When adding or multiplying, everything is modulo 5, so 4 + 4 = 8 mod 5 = 3, and 4 * 3 = 12 mod 5 = 2 for example.

The addition table is:

+	0	1	2	3	4
0	0	1	2	3	4
1	1	2	3	4	0
2	2	3	4	0	1
3	3	4	0	1	2
4	4	0	1	2	3

The multiplication table is:

x	1	2	3	4
0	0	0	0	0
1	1	2	3	4
2	2	4	1	3
3	3	1	4	2
4	4	3	2	1

Galois fields must have a number of elements (order) equal to a prime power, and all Galois fields with the same order are isomorphic to each other. To construct a Galois field with non-prime number of elements (i.e. a prime power number of elements), or an extension field, we use polynomials instead of integers to represent field elements. To construct the extension field $GF(q^m)$ from the base field $GF(q)$ , we use $GF(q)[x]$ modulo $P$ , written $GF(q)[x]/P$ . $GF(q)[x]$ is pronounced “GF(q) adjoin x”, and represents the set of all polynomials whose coefficients are in $GF(q)$ , and $P$ is an irreducible polynomial over $GF(q)$ of degree m. Irreducible polynomials are polynomials which cannot be factored. For example, in $GF(2)$ , $x^2+x+1$ is irreducible, but $x^2+1$ is not, since $(x+1)(x+1)=x^2+2x+1$ , but 2=0, so $(x+1)(x+1)=x^2+1$ . The irreducible polynomials of the first few orders for $GF(2)$ are:

Order	Irreducible Polynomial
2	$x^2+x+1$
3	$x^3+x+1$ , $x^3+x^2+1$
4	$x^4+x+1$ , $x^4+x^3+x^2+x+1$ , $x^4+x^3+1$
5	$x^5+x^2+1$ , $x^5+x^3+x^2+x+1$ , $x^5+x^3+1$ , $x^5+x^4+x^3+x+1$ , $x^5+x^4+x^3+x^2+1$ , $x^5+x^4+x^2+x+1$

There are, in general, multiple irreducible polynomials of each degree for each base field. In practice, we choose the one that has the smallest weight (so we prefer to use $x^3+x+1$ instead of $x^3+x^2+1$ , for example). Since all finite fields of an order are isomorphic, constructing an extension field of a particular order with any valid irreducible polynomial creates an isomorphic finite field (the addition and multiplication tables will just have different entries). Let’s do an example of constructing an extension field $GF(9)$ from the base field $GF(3)$ . We choose $P = x^2 + 1$ , which is an irreducible polynomial of degree 2 in $GF(3)[x]$ . $GF(3)[x]/(x^2+1)$ is then the set of all polynomials with coefficients in $GF(3)$ modulo $x^2+1$ , or the set $\{0, 1, 2, x, x+1, x+2, 2x, 2x+1, 2x+2\}$ , which has 9 elements. Addition and multiplication using this polynomial representation is modulo $x^2+1$ and modulo 3. Again, for understanding, let’s write out the multiplication table (the addition table is easier to calculate) for $GF(3)[x]/(x^2+1)$ and work out some specific examples.

The multiplication table is:

x	$0$	$1$	$2$	$x$	$x+1$	$x+2$	$2x$	$2x+1$	$2x+2$
$0$	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$0$
$1$	$0$	$1$	$2$	$x$	$x+1$	$x+2$	$2x$	$2x+1$	$2x+2$
$2$	$0$	$2$	$1$	$2x$	$2x+2$	$2x+1$	$x$	$x+2$	$x+1$
$x$	$0$	$x$	$2x$	$2$	$x+2$	$2x+2$	$1$	$x+1$	$2x+1$
$x+1$	$0$	$x+1$	$2x+2$	$x+2$	$2x$	$1$	$2x+1$	$2$	$x$
$x+2$	$0$	$x+2$	$2x+1$	$2x+2$	$1$	$x$	$x+1$	$2x$	$2$
$2x$	$0$	$2x$	$x$	$1$	$2x+1$	$x+1$	$2$	$2x+2$	$x+2$
$2x+1$	$0$	$2x+1$	$x+2$	$x+1$	$2$	$2x$	$2x+2$	$x$	$1$
$2x+2$	$0$	$2x+2$	$x+1$	$2x+1$	$x$	$2$	$x+2$	$1$	$2x$

Here are some randomly selected worked out examples so you can get a feel for modular arithmetic on polynomials. For each of these, remember that $x^2+1=0$ and $3=0$ , so adding $3x$ or $3$ is the same as adding 0. I also want to reduce $x^2+1$ to 0 to get rid of $x^2$ terms, since the result of an operation should always be a member of the field (closure) and the elements in the field only go up to $x^1$ .

$x(x+2) = x^2+2x = x^2+3+2x = (x^2+1)+2x+2 = 2x+2$
$2x(x+1) = 2x^2+2x = 2x^2+3+2x = 2(x^2+1)+2x+1 = 2x+1$
$(x+1)(x+2) = x^2+3x+2 = x^2+2 = (x^2+1)+1 = 1$
$(2x)(2x) = 4x^2 = 4x^2 + 6 = 4(x^2+1)+2 = 2$

To someone who has done regular boring arithmetic their whole life, modular arithmetic can seem strange and unwieldy at first, but you may find it more enjoyable once you get the hang of it!

A primitive element, $\alpha$ , of a Galois field, is an element whose successive powers generate all elements of the field except 0. In other words, it is a primitive (n-1)th root of unity, where n is the number of the elements in the field. Finding primitive elements is a matter of brute force: try all elements other than 0 and 1. This is easier than it sounds though, since about a third of the elements in a finite field are primitive elements. For example, $\alpha=2$ is a primitive element of $GF(3)$ and $GF(5)$ , but not $GF(7)$ . $\alpha=x+1$ is a primitive element of $GF(3)[x]/(x^2+1)$ . To see why, let’s do the multiplication, which is made easy by jumping to the correct cell in the multiplication table above.

$(x+1)^1 = x+1$
$(x+1)^2 = 2x$
$(x+1)^3 = 2x+1$
$(x+1)^4 = 2$
$(x+1)^5 = 2x+2$
$(x+1)^6 = x$
$(x+1)^7 = x+2$
$(x+1)^8 = 1$

We’ve been using the base field $GF(3)$ in this section to give an example with field order higher than 2. In the remainder of this blog though, we’ll use $GF(2)$ as the base field in examples, since ultimately we’re interested in binary codes, for which $GF(2)$ is always the base field.

There is a unique polynomial $m(\alpha)$ over $GF(q)[x]$ for each element $\alpha$ in $GF(q^m)$ which is the smallest polynomial possible, and such that $m(\alpha) = 0$ , called a minimal polynomial. A primitive polynomial is the minimal polynomial of a primitive element. As an example, let’s construct the finite field $GF(32)$ , or $GF(2^5)$ . We use $GF(2)[x]/(x^5+x^2+1)$ . Some examples of elements are: 0, 1, $x$ , $x+1$ , $x^2$ , $x^2+1$ , $x^2+x$ , etc. For a primitive element $\alpha$ , we can generate each element of $GF(32)$ by successive multiplication by $\alpha$ , as shown below. We use $z$ instead of $x$ as the symbol for finite field polynomial terms because we will start reserving $x$ for minimal polynomials.

$\alpha^1 = z$
$\alpha^2 = z^2$
$\alpha^3 = z^3$
$\alpha^4 = z^4$
$\alpha^5 = z^2 + 1$ (since we’re in modulo 2 and modulo $x^5+x^2+1$ )
$\alpha^6 = z^3+z$
$\alpha^7 = z^4+z^2$
$\alpha^8 = z^5+z^3 = z^3+z^2+1$
$\alpha^9 = z^4+z^3+1$
…and so forth, until
$\alpha^{31} = 1$

Each $\alpha^i$ has a minimal polynomial associated with it, and each minimal polynomial has at least one $\alpha^i$ as a root. They can be found through the fact that if $\alpha^i$ is a root of a minimal polynomial over the base field $GF(p)$ , then so is $\alpha^{i*p}$ . So for our example using $GF(32)$ , $\alpha$ , $\alpha^2$ , $\alpha^4$ , $\alpha^8$ , $\alpha^{16}$ all share the same minimal polynomial, and so do $\alpha^3$ , $\alpha^6$ , $\alpha^{12}$ , $\alpha^{24}$ , $\alpha^{17}$ (17 = 48 mod 31). To find the minimal polynomial for $\alpha$ , $\alpha^2$ , $\alpha^4$ , $\alpha^8$ , $\alpha^{16}$ , we can calculate $(x-z)(x-z^2)(x-z^4)(x-z^3-z^2-1)(...)$ and simplify. All the $z$ ’s should cancel out and we should be left with a polynomial in $x$ only. We’ll go over a more explicit example where minimal polynomials are computed in Section 3.

Linear binary cyclic codes

Next, let’s discuss linear binary cyclic codes.

Linear: The weighted sum of any two codewords is also a codeword
Binary: The codeword symbols are 0 and 1
Cyclic: Any cyclic shift of a codeword is another codeword

For example, all sets of valid 4-length linear binary cyclic codes are:

{0000, 1111}
{0000, 0101, 1010, 1111}
{0000, 0110, 0011, 1100, 0001, 0010, 0100, 1000, 0101, 1010, 1001, 1111, 0111, 1110, 1011, 1101}

Binary linear cyclic codes are represented by $GF(2)[x]/(x^n-1)$ . In this representation, multiplying by $x$ amounts to a cyclic left-shift by 1, since $x^n=1$ . Extension fields of $GF(2)$ naturally work to represent sequences of binary numbers, since their coefficients are either 0 or 1, so the resulting polynomials can be thought of as binary messages with the most significant bit (MSB) at either the highest degree of $x$ or lowest degree of $x$ . Binary modular arithmetic and cyclic shifting is also easier to implement in hardware circuits, with exclusive-OR gates and shift registers.

Furthermore, this means that ANY polynomial in $GF(2)[x]/(x^n-1)$ multiplied by a valid codeword is another valid codeword, since valid codewords are some subset of $GF(2)[x]/(x^n-1)$ , and multiplication by an arbitrary polynomial is a linear combination of cyclic shifts of the original codeword. Since all shifted codewords are valid codewords and all linear combinations of codewords are valid codewords, the resulting polynomial must be a valid codeword.

The question then, is how do we generate codewords from messages? I.e. how do we choose a generator polynomial which maps all valid messages to all valid codewords? Furthermore, how do we DESIGN the generator polynomial so that we get the block length, message length and error correction capability that we want?

For a cyclic code C(n,k)

The generator polynomial $g(x)$ is monic (its highest power has coefficient 1)
C consists of all multiples of $g(x)$ with polynomials of degree k-1 or less
$g(x)$ is a factor of $x^n-1$

The third point is the most important, and means that we are in the business of factoring $x^n-1$ . We want the prime factors of $x^n-1$ in $GF(2)[x]$ , and $g(x)$ is the polynomial with the smallest degree which generates the codeword polynomials we want. So if we have codewords of degree n-1 and messages of degree k-1, then $g(x)$ must have degree n-k.

For example, all possible cyclic codeword polynomials of length three reside in $GF(2)[x]/(x^3-1)$ . $x^3-1=x^3+1$ factors into $(x+1)(x^2+x+1)$ , so the possible generator polynomials are:

$g(x) = 1$ (degree 0)
$g(x) = x+1$ (degree 1)
$g(x) = x^2+x+1$ (degree 2)
$g(x) = (x+1)(x^2+x+1) = x^3+1$ (degree 3)

for $g(x) = 1$ , the degree of the message is 2. This is the trivial identity mapping, which maps all codewords of length 3 to themselves.
for $g(x) = x+1$ , the degree of the message is 1. This is a mapping between messages of length 2 and codewords of length 3
for $g(x) = x^2+x+1$ , the degree of the message is 0. This is a mapping between messages of length 1 and codewords of length 3. There are only two messages of length 1: 0, and 1, and only two codewords: 000, 111, so this is a repetition code!
for $g(x) = x^3+1$ , because of the construction of the extension field, $x^3+1 = 0$ , so everything gets trivially mapped to 0.

The beauty and usefulness of creating codes in this way is that decoding a received code has the following properties:

For a received codeword $v(x)$ ,

$v(x) = i(x)g(x) + e(x)$

where $i(x)$ is the true message, and $e(x)$ is the error. Since we know which $\alpha^i$ we constructed $g(x)$ with, $g(x)$ is 0 at any of those $\alpha^i$ ’s. So

$v(\alpha^i) = e(\alpha^i)$

furthermore,

$v(x)$ mod $g(x)$ = $e(x)$ mod $g(x)$ , since $i(x)g(x)$ mod $g(x)$ = 0. These two facts help us locate and correct errors.

BCH Codes

BCH codes give an easy and useful way of defining $g(x)$ given a blocklength and a number of error correction bits. Recall that, in order to find valid generator polynomials, we are looking for factors of $x^n-1$ . To link the factorization of $x^n-1$ with finite fields, we use the fact that the roots of $x^{q-1} - 1$ are the non-zero elements of $GF(q)$ !

In other words, the roots of $x^{q-1}-1$ form a finite field of order q if 0 is added. So for elements $\alpha^i$ of $GF(q)$ , $x^{q-1}-1 = (x-\alpha^1)(x-\alpha^2)...(x-\alpha^{q-1})$

We have already seen that some of the factors of $x^n-1$ combine to form prime factors, and this is true here as well. Each element of $GF(q)$ is the root of EXACTLY ONE of the prime factors of $x^{q-1}-1$ . Therefore, $g(x)$ must be the least common multiple (LCM) of some of the minimal polynomials of $\alpha^i$ . Which ones depends on the code we are trying to create. The key concept of primitive, narrow-sense BCH codes is that, for t bits of desired error correcting capability, the generator polynomial is the lowest degree polynomial with $\alpha^1, \alpha^2, ..., \alpha^{2t}$ as its roots.

The steps to construct a BCH code are:

For a given blocklength $n=q-1$ with $q=2^m$ , construct a finite field $GF(q)$
Find the minimal polynomials of the elements of $GF(q)$
The generator polynomial $g(x)$ is $LCM\{m_1(x), m_2(x), ..., m_{2t}(x)\}$ , where t is the number of error correcting bits desired. The Hamming distance is d = 2t+1
The message length is k = n - (degree( $g(x)$ ) + 1). This creates an (n, k, d) BCH code.

Let’s do an example for a code of blocklength $15 = 2^4-1$ . This requires us to construct the extension field $GF(16)$ , which we can do with the irreducible polynomial $x^4+x+1$ . So our extension field is $GF(2)[x]/(x^4+x+1)$ . The field elements are:

element	polynomial	binary
$0$	$0$	0000
$1$	$1$	0001
$\alpha$	$z$	0010
$\alpha^2$	$z^2$	0100
$\alpha^3$	$z^3$	1000
$\alpha^4$	$z+1$	0011
$\alpha^5$	$z^2+z$	0110
$\alpha^6$	$z^3+z^2$	1100
$\alpha^7$	$z^3+z+1$	1011
$\alpha^8$	$z^2+1$	0101
$\alpha^9$	$z^3+z$	1010
$\alpha^{10}$	$z^2+z+1$	0111
$\alpha^{11}$	$z^3+z^2+z$	1110
$\alpha^{12}$	$z^3+z^2+z+1$	1111
$\alpha^{13}$	$z^3+z^2+1$	1101
$\alpha^{14}$	$z^3+1$	1001

(note, $\alpha^{15}=1$ )

The first minimal polynomial $m_1(x)$ is associated with $\alpha, \alpha^2, \alpha^4, \alpha^8$ , so

$m_1(x) = (x-z)(x-z^2)(x-z-1)(x-z^2-1) = x^4+x+1$

similarly, $m_3(x)$ is associated with $\alpha^3, \alpha^6, \alpha^{12}, \alpha^9$ (24-15=9), so

$m_3(x) = (x-z^3)(x-z^3-z^2)(x-z^3-z)(x-z^3-z^2-z-1) = x^4+x^3+x^2+x+1$

and so forth:

$m_5(x) = (x-z^2-z)(x-z^2-z-1) = x^2+x+1$

$m_7(x) = (x-z^3-z-1)(x-z^3-1)(x-z^3-z^2-1)(x-z^3-z^2-z) = x^4+x^3+1$

So the minimal polynomials are

element	minimal polynomial
$\alpha$	$x^4+x+1 = m_1$
$\alpha^2$	$m_1$
$\alpha^3$	$x^4+x^3+x^2+x+1 = m_3$
$\alpha^4$	$m_1$
$\alpha^5$	$x^2+x+1 = m_5$
$\alpha^6$	$m_3$
$\alpha^7$	$x^4+x^3+1 = m_7$
$\alpha^8$	$m_1$
$\alpha^9$	$m_3$
$\alpha^{10}$	$m_5$
$\alpha^{11}$	$m_7$
$\alpha^{12}$	$m_3$
$\alpha^{13}$	$m_7$
$\alpha^{14}$	$m_7$

A valid $g(x)$ is found by choosing t and computing $LCM\{m_1(x), ..., m_{2t}(x)\}$ . For example:

For t=1, we get $g(x) = LCM\{m_1(x), m_2(x)\} = m_1(x)$ , and $g(x)$ has degree 4. For block length 15, this gives a (15, 11, 3) BCH code with one bit of error correcting capability.
For t=2, we get $g(x) = LCM\{m_1(x), m_2(x), m_3(x), m_4(x)\} = m_1(x)m_3(x)$ , and $g(x)$ has degree 8. This gives a (15, 7, 5) BCH code with two bits of error correcting capability.
For t=3, we get $g(x) = m_1(x)m_3(x)m_5(x)$ , and $g(x)$ has degree 10. This gives a (15, 5, 7) BCH code, and corrects 3 errors.
For t=4, we get $g(x) = m_1(x)m_3(x)m_5(x)m_7(x)$ , and $g(x)$ has degree 14. This gives a (15, 1) repetition code with Hamming distance 15 and corrects 7 errors.
If we add the final $(x+1)$ factor, then we get $x^{15}+1$ as the generator polynomial. Since $x^{15}+1=0$ , this generator polynomial maps everything to 0, which is a trivial mapping.
Also, we could have used just 1 as the generator polynomial, which maps everything to itself (i.e. a (15, 15, 0) code), which is another trivial mapping.

Codes constructed this way are called primitive (i.e. only block lengths of $2^m-1$ are used) narrow-sense (i.e. we start with $\alpha^1$ instead of another $\alpha$ ) BCH codes.

Encoding BCH codes

BCH codes can be constructed non-systematically, by simply multiplying a message polynomial by the generator polynomial, or systematically, by embedding the message itself in the codeword. The only requirement is that the transmitted codeword is divisible by $g(x)$ . In the first case, we have

$v(x) = g(x)i(x)$

In the second case, we have

$v(x) = i(x)x^{n-k} + r(x)$ , where $r(x) = i(x)x^{n-k}$ mod $g(x)$

so, $i(x)$ gets shifted to the MSB of $v(x)$ , and then the leftover bits get modified to make $v(x)$ divisible by $g(x)$ .

Non-systematic encoding is trivial to implement: just multiply the message by the generator and simplify. To implement systematic encoding, we shift the message polynomial so that its MSB is at the MSB of the codeword, and then modify the zeros at the LSB of the codeword (of length n-k) in a way so that the resulting codeword is divisible by $g(x)$ . Practically this looks like the following:

For a codeword represented by a polynomial $v(x)$ of degree n, and message represented by a polynomial $i(x)$ of degree k, the generator polynomial $g(x)$ is of degree n-k, so:

We multiply the message polynomial $i(x)$ by $x^{n-k}$ to shift it to MSB position in the codeword polynomial $v(x)$
$v(x)$ must be divisible by $g(x)$ to be a valid codeword, so it can be written as $q(x)g(x)$ for some $q(x)$ to be determined
The difference between $q(x)g(x)$ and $i(x)x^{n-k}$ is the remainder $r(x)$ , which we can compute by taking $i(x)x^{n-k}$ and finding the modulo with $g(x)$
The encoded codeword is then $i(x)x^{n-k} + r(x)$ (+ and - are the same in modulo 2 arithmetic). Since $r(x)$ is guaranteed to have degree less than n-k, it doesn’t interfere with our k-degree message polynomial

So we need to find $i(x)x^{n-k}$ mod $g(x)$ , and set those bits in the codeword after shifting the message to th MSB.

Let’s do an example of encoding a message using non-systematic and systematic encoding, using the (15, 11, 3) code that we computed previously. This code has $g(x) = x^4+x+1$ . Our message can be any 11-length binary number, or 10th degree polynomial. We’ll choose 10100010001 as our message, or $i(x)=x^{10} + x^8 + x^4 + 1$ .

For non-systematic encoding, we simply multiply $i(x)$ with $g(x)$ , to get $x^{14}+x^{12}+x^{11}+x^{10}+x^{9}+x^5+x+1$ , or 101111000100011 in binary.

For systematic encoding, we start by multiplying $i(x)$ by $x^4$ to obtain $x^{14} + x^{12}+x^8+x^4$ . We then use polynomial division mod 2 to find the remainder after dividing by $g(x)$ , which happens to be zero (I never said I would choose a hard example – you can check that $g(x)(x^{10}+x^8+x^7+x^6+x^5+x^4)=x^{14} + x^{12}+x^8+x^4$ ). So the systematically encoded codeword is 101000100010000, which indeed contains the original message verbatim in the MSB of the codeword.

Decoding BCH codes

Decoding BCH codes is more involved than encoding them, but ultimately follows a procedure which is tedious but sane, and easy to implement in both software and hardware. We start with the received polynomial $r(x)$ :

$r(x) = r_0 + r_1 x + r_2 x^2 + ... + r_{n-1} x^{n-1} = v(x) + e(x)$

where $v(x)$ is the transmitted codeword, and $e(x)$ is the error. The error for binary codes can be written

$e(x) = x^{j_1} + x^{j_2} + ... + x^{j_\nu}$

where $\{j_1, ... j_{\nu}\$ } is the set of error locations. The sydromes $S_i$ are just $r(x)$ evaluated at $\alpha^i$ for $i\in \{1,2,...,2t\}$ . An easier pen-and-paper way to evaluate the syndromes is to use the fact that, for a particular $\alpha^i$ , if $m_i(x)$ , the minimal polynomial for $\alpha^i$ is known, then

$r(x) = m_i(x)a_i(x) + b_i(x)$

for some $a_i(x)$ , and where $b_i(x)$ is $r(x)$ mod $m_i(x)$ . Since $m_i(\alpha^i) = 0$ , we can write

$S_i = r(\alpha_i) = b_i(\alpha_i)$

For example, for a (15,7,5) BCH code, if r(x) = 000000100000001 or $r(x) = x^8 + 1$ , then since t=2, we need to compute four syndromes, $S_1$ , $S_2$ , $S_3$ , and $S_4$ . We can do this by finding $r(x)$ mod $m_i(x)$ for $i=1, 2, 3, 4$ .

For the first syndrome, . mod .
- $S_1 = \alpha^2$
For the second syndrome, , so
- $S_2 = \alpha^4$
For the third syndrome, mod .
- $S_3 = \alpha^7$
For the fourth syndrome, , so
- $S_4 = \alpha^8$

So $(S_1, S_2, S_3, S_4) = (\alpha^2, \alpha^4, \alpha^7, \alpha^8)$

Since $S(\alpha^i) = e(\alpha^i)$ , the syndromes $S_1, S_2,...,S_{2t}$ can also be written

$S_1 = (\alpha^1)^{j_1} + (\alpha^1)^{j_2} + ... + (\alpha^1)^{j_{\nu}}$
$S_2 = (\alpha^2)^{j_1} + (\alpha^2)^{j_2} + ... + (\alpha^2)^{j_{\nu}}$
$...$
$S_{2t} = (\alpha^{2t})^{j_1} + (\alpha^{2t})^{j_2} + ... + (\alpha^{2t})^{j_{\nu}}$

or alternatively,

$S_1 = (\alpha^{j_1})^1 + (\alpha^{j_2})^1 + ... + (\alpha^{j_{\nu}})^1$
$S_2 = (\alpha^{j_1})^2 + (\alpha^{j_2})^2 + ... + (\alpha^{j_{\nu}})^2$
$...$
$S_{2t} = (\alpha^{j_1})^{2t} + (\alpha^{j_2})^{2t} + ... + (\alpha^{j_{\nu}})^{2t}$

Solving for the unknowns $\{\alpha^{j_1}, \alpha^{j_2}, ..., \alpha^{j_{\nu}}\}$ is the goal of any algorithm for decoding BCH codes. However, there are in general $2^k$ possible solutions, so, if the number of errors $\nu$ is less than the designed error correcting capability t, then the solution which yields the smallest number of errors is the correct solution. This corresponds to the solution with the smallest $\nu$ .

The syndrome-error equations can be simplified by defining

$\beta_i = \alpha^{j_i}$

after which the syndrome-error equations can be written

$S_1 = \beta_1 + \beta_2 + ... + \beta_{\nu}$
$S_2 = \beta_1^2 + \beta_2^2 + ... + \beta_{\nu}^2$
$...$
$S_{2t} = \beta_1^{2t} + \beta_2^{2t} + ... + \beta_{\nu}^{2t}$

These are power sum symmetric functions. Using the $\beta$ ’s, we can define the error locator polynomial $\sigma(x)$ as

$\sigma(x) = (1 + \beta_1 x)(1 + \beta_2 x) ... (1 + \beta_{\nu} x)$
$= \sigma_0 + \sigma_1 x + \sigma_2 x^2 + ... + \sigma_{\nu} x^\nu$

The roots of $\sigma(x)$ are $\beta_1^{-1}$ , $\beta_2^{-1}$ , …, $\beta_{\nu}^{-1}$ , which are the inverses of the error locations. This leap of defining an additional polynomial might seem arbitrary, but it’s for a good reason. If $\sigma(x)$ is expanded out, the relationship between its coefficients and the $\beta$ ’s are in the form of elementary symmetric polynomials:

$\sigma_0 = 1$
$\sigma_1 = \beta_1 + \beta_2 + ... + \beta_{\nu}$
$\sigma_2 = \beta_1\beta_2 + \beta_2\beta_3 + ... + \beta_{\nu-1}\beta_{\nu}$
$...$
$\sigma_{\nu} = \beta_1\beta_2...\beta_{\nu}$

From the theory of elementary symmetric polynomials (but feel free to check that this holds for the first few equations), we know that the syndromes and the coefficients of $\sigma$ must obey Newton’s identities:

$S_1 + \sigma_1 = 0$
$S_2 + \sigma_1 S_1 + 2\sigma_2 = 0$
$S_3 + \sigma_1 S_2 + \sigma_2 S_1 + 3\sigma_3 = 0$
$...$
$S_{\nu} + \sigma_1 S_{\nu-1} + ... + \sigma_{\nu-1} S_1 + \nu \sigma_{\nu} = 0$
$S_{\nu+1} + \sigma_1 S_{\nu} + ... + \sigma_{\nu-1} S_2 + \sigma_{\nu} S_1 = 0$
$...$

So defining $\sigma(x)$ in this way links the error location values $\beta_i$ with the syndromes $S_i$ via Newton’s identities. Our objective is to determine the error locator polynomial’s coefficients, after which, the roots of the error locator polynomial can be solved for. There may be many such $\sigma(x)$ ’s, but we want the one with minimal degree. The process for decoding the error has three steps:

Compute the syndromes $S_i$ by evaluating $r(x)$ at $\alpha^i$ for $i \in \{1,2,...,2t\}$
Using the syndromes, compute the error locator polynomial $\sigma(x)$ , whose roots are the inverse of the locations of the errors
For binary codes, once the error locations are known, those bits just need to be flipped in the code. A practical consideration is that bits are indexed from 1 to n instead of 0 to n-1, so an error at $\alpha^1$ is an error at the least significant bit, and an error at $\alpha^{15}=1$ is an error at the most significant bit.

In this section, we’ll cover one particular algorithm for decoding BCH codes. Berlekamp’s algorithm gives an iterative procedure for solving for the $\sigma(x)$ with smallest degree that satisfies Newton’s identities. The algorithm does this by constructing trial functions $\sigma^{(\mu)}$ which satisfy the first $\mu$ Newton’s identities one by one until $\sigma^{(2t)}$ is reached, which is minimal degree polynomial that satisfies the first 2t Newton’s identities. Once $\sigma(x)$ is known, all that remains to be done is to find its roots by exhaustively substituting all n-1 values of $\alpha^i$ to see which ones result in $\sigma(\alpha^i) = 0$ . The error locations are then the inverses of those $\alpha^i$ ’s. Remember, in a finite field the inverse of $\alpha^i$ is simply $\alpha^{n-i}$ , where n+1 is the order of the field.

Let’s go into detail about the procedure. First, let

$\sigma^{(\mu)} = 1 + \sigma_1^{(\mu)} x + \sigma_2^{(\mu)} x^2 + ... + \sigma_{l_{\mu}}^{(\mu)} x^{l_{\mu}}$

be the $\sigma$ constructed on the $\mu$ th iterative step, whose coefficients produce a polynomial that satisfies the first $\mu$ Newton’s identifies. $l_{\mu}$ is the degree of $\sigma^{(\mu)}$ . To determine $\sigma^{(\mu+1)}$ , we must make a correction to $\sigma^{(\mu)}$ to make sure it satisfies the first $\mu + 1$ Newton’s identities. We do this by computing the discrepancy:

$d_{\mu} = S_{\mu+1} + \sigma_1^{(\mu)} S_{\mu} + \sigma_2^{(\mu)} S_{\mu-1} + ... + \sigma_{l_{\mu}}^{(\mu)} S_{u+1-l_{\mu}}$

if $d_{\mu} = 0$ , we simply set $\sigma^{(\mu+1)} = \sigma^{(\mu)}$ . If $d_{\mu}$ is nonzero, we need to look to a previous iteration, indexed at $\rho$ , where $d_{\rho} \neq 0$ and $\rho - l_{\rho}$ is largest. With the $\sigma^{(\rho)}$ from that iteration, set

$\sigma^{(\mu + 1)} = \sigma^{(\mu)} + d_{\mu}d_{\rho}^{-1}x^{\mu-\rho}\sigma^{(\rho)}$

which gives the minimal polynomial which satisfies the first $\mu+1$ Newton’s identities. This step will definitely be confusing for any reasonable person, but hopefully it will make sense when we do an example. Also, so far the whole procedure has been presented abstractly, but the thing to remember is that everything, from the coefficients of $\sigma$ to the discrepancies d to the error locator values $\beta$ , are all elements of the finite field. So they’re all an $\alpha^i$ (including 1) or 0.

The Berlekamp procedure is best summarized with a table. Regardless of the problem, the default starting table is

$\mu$	$S_{\mu}$	$\sigma^{(\mu)}$	$d_\mu$	$l_{\mu}$	$\mu-l_{\mu}$
-1	-	1	1	0	-1
0	-	1	$S_1$	0	0

Rows are filled in one at a time until row 2t, and we start on row 0. Assuming we’ve filled up to the $\mu$ th row, filling the $\mu+1$ st row is done as follows:

Compute $d_{\mu}$ for the row
If $d_{\mu} = 0$ , then $\sigma^{(\mu+1)} = \sigma^{(\mu)}$ and $l_{\mu+1} = l_{\mu}$
If $d_{\mu} \neq 0$ , then find another row prior to (but not including) the $\mu$ th row, row $\rho$ , such that $d_{\rho} \neq 0$ and $\rho - l_{\rho}$ is the largest value. Then update $\sigma^{(\mu+1)}$ via

$\sigma^{(\mu+1)} = \sigma^{(\mu)} + d_{\mu}d_{\rho}^{-1}x^{\mu-\rho}\sigma^{(\rho)}$

and $l_{\mu+1} = max(l_{\mu}, l_{\rho} + \mu - \rho)$ – but this equation isn’t needed in practice though: $l_{\mu}$ is always the degree of $\sigma^{(\mu)}$ , which can be easily read off.

All of this sounds super complicated, but it really isn’t: it’s just tedious. Let’s do an example to see how it works in practice. Let’s say that we are using a (15,5) triple error correcting code, and the transmitted codeword v = 000000000000000 is received as r = 001000000101000, or $r(x)=x^{12}+x^5+x^3$ . The first step is to compute the syndromes. There are six syndromes to compute since we are using a triple error correcting code. To follow the computations, the reader may find it useful to refer to the tables for $GF(16)$ that we compiled in an earlier section.

We need to compute syndromes for $\alpha^1$ to $\alpha^6$ . The minimal polynomial for $\alpha^1$ , $\alpha^2$ , and $\alpha^4$ is $m_1(x)=x^4+x+1$ . The minimal polynomial for $\alpha^3$ and $\alpha^6$ is $m_3(x)=x^4+x^3+x^2+x+1$ . The minimal polynomial for $\alpha^5$ is $m_5(x)=x^2+x+1$ . By pen and paper, the easiest way to compute the syndromes is to compute the remainder of $r(x)$ after dividing by $m_i(x)$ , and evaluating the remainder $b_i(x)$ at $\alpha_i$ . Let’s do this for each $b_i(x)m_i(x)$ :

$b_1 = (x^{12}+x^5+x^3) \text{ mod } (x^4+x+1) = 1$
$b_3 = (x^{12}+x^5+x^3) \text{ mod } (x^4+x^3+x^2+x+1) = x^3+x^2+1$
$b_5 = (x^{12}+x^5+x^3) \text{ mod } (x^2+x+1) = x^2$

So:

$S_1 = b_1(\alpha) = 1$
$S_2 = b_1(\alpha^2) = 1$
$S_3 = b_3(\alpha^3) = \alpha^9 + \alpha^6 + 1 = z^3+z + z^3+z^2 + 1 = z^2+z+1 = \alpha^10$
$S_4 = b_1(\alpha^4) = 1$
$S_5 = b_5(\alpha^5) = (\alpha^5)^2 = \alpha^{10}$
$S_6 = b_3(\alpha^6) = \alpha^{18} + \alpha^{12} + 1 = \alpha^{12} + \alpha^3 + 1 = z^3+z^2+z+1+z^3+1 = z^2+z+1 = \alpha^{5}$

We’ll write down the final table first, and then dissect the results row by row:

$\mu$	$S_{\mu}$	$\sigma^{(\mu)}$	$d_{\mu}$	$l_{\mu}$	$\mu-l_{\mu}$
-1	-	1	1	0	-1
0	-	1	$S_1$	0	0
1	1	$x+1$	0	1	0
2	1	$x+1$	$\alpha^5$	1	1
3	$\alpha^{10}$	$\alpha^5 x^2+x+1$	0	2	1
4	1	$\alpha^5 x^2+x+1$	$\alpha^{10}$	2	2
5	$\alpha^{10}$	$\alpha^5 x^3+x+1$	0	3	2
6	$\alpha^5$	$\alpha^5 x^3+x+1$	-	-	-

We start at $\mu=0$ . Since our discrepancy (by default) is $S_1=1$ , which is not zero, to compute $\sigma^{(\mu+1)}$ , we need to choose a previous row $\rho$ where $d_{\rho} \neq 0$ , with the largest $\rho-l_{\rho}$ . Since we are not allowed to choose our current row, there is only one choice: $\rho=-1$ . Using this row, we have

$\sigma^{(1)} = \sigma^{(0)} + d_{0}d_{-1}^{-1}x^{0-(-1)}\sigma^{(-1)}$ $= 1 + (1)(1)x^{0-(-1)}(1)$ $= x + 1$

This allows us to move to the $\mu=1$ row. At this row, $l_{1} = 1$ since the degree of $\sigma^{(1)}$ is 1, and so $1-l_{1} = 1-1 = 0$ . Now we need to compute the discrepancy for this row, which is done by computing

$d_{\mu} = S_{\mu+1} + \sigma_1^{(\mu)} S_{\mu} + \sigma_2^{(\mu)} S_{\mu-1} + ... + \sigma_{l_{\mu}}^{(\mu)} S_{u+1-l_{\mu}}$

Here, $\mu=1$ , $S_{\mu+1} = S_2 = 1$ , $\sigma_1^{(\mu)}=1$ , and $S_{\mu}=S_1=1$ , so $d_1 = 1 + (1)(1) = 1 + 1 = 0$ . Since the discrepancy is zero, we’re free to move onto the next row by setting

$\sigma^{(\mu+1)} = \sigma^{(\mu)}$

so $\sigma^{(2)} = \sigma^{(1)} = x+1$ , and $l_2 = 1$ , and $2-l_2 = 2-1 = 1$ . Again, we need to compute the discrepancy:

$d_2 = S_3 + (1)(S_2) = \alpha^{10} + 1 = z^2+z+1+1 = z^2+z = \alpha^5$

Since $d_2$ is not zero, we need to select the previous column with largest $\rho-l_{\rho}$ . We have two choices this time where $d_{\rho} \neq 0$ : $\rho=-1$ and $\rho=0$ , of which $\rho=0$ has the largest $\rho-l_{\rho}$ . Using this row, we are able to compute $\sigma^{(3)}$ :

$\sigma^{(3)} = \sigma^{(2)} + d_2 d_0^{-1} x^{2-0} \sigma^{(0)}$
$= (x+1) + \alpha^5 x^2 (1)$
$= \alpha^5 x^2 + x + 1$

On this row, $l_3 = 2$ and $3-l_3 = 3-2 = 1$ . The discrepancy is

$d_3 = S_4 + (1)(S_3) + \alpha^5 S_2 = 1 + \alpha^{10} + \alpha^5 = 1 + z^2+z+1 +z^2+z = 0$

Since $d_3=0$ , we are free to move onto $\mu=4$ with the same $\sigma$ :

$\sigma^{(4)} = \sigma^{(3)} = \alpha^5 x^2 + x + 1$

On this row, $l_4 = 2$ and $4-l_4 = 2$ . The discrepancy is

$d_4 = S_5 + (1)(S_4) + \alpha^5 S_3 = \alpha^{10} + 1 + 1 = \alpha^{10}$

Since $d_4$ is nonzero, we have to again choose a previous row. The options available with $d_{\rho} \neq 0$ are $\rho=-1$ , $\rho=0$ , and $\rho=2$ . Of these, $\rho=2$ has the largest $\rho-l_{\rho}$ of 1. With this $\rho$ ,

$\sigma^{(5)} = \sigma^{(4)} + d_4 d_2^{-1} x^{4-2} \sigma^{(2)}$
$= \alpha^5 x^2 + x + 1 + \alpha^{10} \alpha^{10} x^2 (x+1)$
$= \alpha^5 x^2+x+1+\alpha^5 x^3+\alpha^5 x^2$
$= \alpha^5 x^3 + x + 1$
(where we used the fact that $(\alpha^5)^{-1} = \alpha^{15-5} = \alpha^{10}$ and that $\alpha^{10}\alpha^{10} = \alpha^{15}\alpha^{5} = \alpha^{5}$ )

At row 5, $l_{5}=3$ and $5-l_5= 5-3 = 2$ . The discrepancy at this row is

$d_5 = S_6 + (1)(S_5) + (0)(S_4) + \alpha^5 S_3$
$= \alpha^5 + \alpha^{10} + \alpha^{15}$
$= \alpha^{10} + \alpha^5 + 1$
$= z^2+z+1+z^2+z+1= 0$

Since $d_5 = 0$ , $\sigma^{(6)} = \sigma^{(5)} = \alpha^5 x^3 + x + 1$ . Furthermore, since we’ve reached row 6, which is the final row, we have found $\sigma(x)$ . So the error locator polynomial is

$\sigma(x) = \alpha^5 x^3 + x + 1$

Now all that remains is to find the roots of $\sigma(x)$ . We can do this by exhaustively trying all values of $\alpha^i$ . We will do the first few to get a taste of the arithmetic, and then state the other results.

$\sigma(1) = \alpha^5$
$\sigma(\alpha) = \alpha^6 + \alpha + 1 = z^3+z^2+z+1 = \alpha^{12}$
$\sigma(\alpha^2) = \alpha^{11} + \alpha^2 + 1 = z^3+z^2+z+z^2+1 = \alpha^{10}$
$\sigma(\alpha^3) = \alpha^{14} + \alpha^2 + 1 = z^3+1+z^3+1 = 0$

Similarly, $\sigma(\alpha^{10}) = 0$ , and $\sigma(\alpha^{12}) = 0$ , so $\alpha^3$ , $\alpha^{10}$ , and $\alpha^{12}$ are the roots of the error locator polynomial. Note that the Galois field element 1 corresponds to $\alpha^{15}$ . To the the error locations, we take the inverses of each of these:

$(\alpha^3)^{-1} = \alpha^{15-3} = \alpha^{12}$
$(\alpha^{10})^{-1} = \alpha^{15-10} = \alpha^5$
$(\alpha^{12})^{-1} = \alpha^{15-12} = \alpha^3$

So the error locations are at bits 3, 5, and 12, which is where we put the errors in the beginning of the example. Again, note that if one of the roots of $\sigma(x)$ had been 1, then the inverse would also be 1, which indicates an error on the 15th, or most significant bit, since bits are indexed from 1 to 15 rather than from 0 to 14. At this stage, if we knew that the message was encoded systematically, we could just read off the 5 most significant bits. If we knew that the message was encoded non-systematically, we would have to divide the codeword by the generator polynomial to recover the message.