"Coding and Cryptography 001.ps.gz" - читать интересную книгу автора

Coding and Cryptography

T. W. K"orner

July 23, 1998

Transmitting messages across noisy channels is an important practical problem. Coding theory provides explicit ways of ensuring that messages remain legible even in the presence of errors. Cryptography on the other hand, makes sure that messages remain unreadable -- except to the intended recipient. These complementary techniques turn out to have much in common mathematically.

Small print The syllabus for the course is defined by the Faculty Board Schedules (which are minimal for lecturing and maximal for examining). I should very much appreciate being told of any corrections or possible improvements and might even part with a small reward to the first finder of particular errors. This document is written in LATEX2e and stored in the file labelled ~twk/IIA/Codes.tex on emu in (I hope) read permitted form. My e-mail address is twk@dpmms.

These notes are based on notes taken in the course of the previous lecturer Dr Pinch. Most of their virtues are his, most of their vices mine. Although the course makes use of one or two results from probability theory and a few more from algebra it is possible to follow the course successfully whilst taking these results on trust. There is a note on further reading at the end but [7] is a useful companion for the first three quarters of the course (up to the end of section 8) and [9] for the remainder. Please note that vectors are row vectors unless otherwise stated.

Contents 1 What is an error correcting code? 2 2 Hamming's breakthrough 5 3 General considerations 8 4 Linear codes 13

1

5 Some general constructions 18 6 Polynomials and fields 22 7 Cyclic codes 27 8 Shift registers 33 9 A short homily on cryptography 37 10 Stream cyphers 39 11 Asymmetric systems 46 12 Commutative public key systems 48 13 Trapdoors and signatures 52 14 Further reading 54 15 First Sheet of Exercises 57 16 Second Sheet of Exercises 60

1 What is an error correcting code? Originally codes were a device for making messages hard to read. The study of such codes and their successors is called cryptography and will form the subject of the last quarter of these notes. However in the 19th century the optical1 and then the electrical telegraph made it possible to send messages speedily but at a price. That price might be specified as so much per word or so much per letter. Obviously it made sense to have books of `telegraph codes' in which one five letter combination QWADR, say, meant `please book quiet room for two' and another QWNDR meant `please book cheapest room for one'. Obviously, also, an error of one letter of a telegraph code could have unpleasant consequences.

Today messages are usually sent in as binary sequences like 01110010 : : : , but the transmission of each digit still costs money. Because of this, messages

1See The Count of Monte Christo and various Napoleonic sea stories. A statue to the inventor of the optical telegraph (semaphore) used to stand somewhere in Paris but seems to have disappeared.

2

are often `compressed', that is shortened by removing redundant structure2 In recognition of this fact we shall assume that we are asked to consider a collection of m messages each of which is equally likely.

Our model is the following. When the `source' produces one of the m possible messages _i say, it is fed into a `coder' which outputs a string ci of n binary digits. The string is then transmitted one digit at a time along a `communication channel'. Each digit has probability p of being mistransmitted (so that 0 becomes 1 or 1 becomes 0) independently of what happens to the other digits [0 ^ p ! 1=2]. The transmitted message is then passed through a `decoder' which either produces a message _j (where we hope that j = i) or an error message and passes it on to the `receiver'.

Exercise 1.1. Why do we not consider the case 1 * p ? 1=2? What if p = 1=2?

An obvious example is the transmission of data from a distant space probe where (at least in the early days) the coder had to be simple and robust but the decoder could be as complex as the designer wished. On the other hand the decoder in a home CD player must be cheap but the encoding system which produces the disc can be very expensive.

For most of the time we shall concentrate our attention on a code C ` f0; 1gn consisting of the codewords ci. We say that C has size m = jCj. If m is large then we can carry a large number of possible messages (that

2In practice the situation is more complicated. Engineers distinguish between irreversible 'lossy compression' and reversible 'lossless compression'. For compact discs where bits are cheap the sound recorded can be reconstructed exactly. For digital sound broadcasting where bits are expensive the engineers make use of knowledge of the human auditory system (for example, the fact that we can not make out very soft noise in the presence of loud noises) to produce a result that might sound perfect (or nearly so) to us but which is in fact not. For mobile phones there can be greater loss of data because users do not demand anywhere close to perfection. For digital TV the situation is still more striking with reduction in data content from film to TV of anything up to a factor of 60. However medical and satellite pictures must be transmitted with no loss of data. Notice that lossless coding can be judged by absolute criteria but the merits of lossy coding can only be judged subjectively.