Network Working Group R. Braden
Request for Comments: 1071 ISI
D. Borman
Cray Research
C. Partridge
BBN Laboratories
September 1988
Computing the Internet Checksum
Status of This Memo
This memo summarizes techniques and algorithms for efficiently
computing the Internet checksum. It is not a standard, but a set of
useful implementation techniques. Distribution of this memo is
unlimited.
1. Introduction
This memo discusses methods for efficiently computing the Internet
checksum that is used by the standard Internet protocols IP, UDP, and
TCP.
An efficient checksum implementation is critical to good performance.
As advances in implementation techniques streamline the rest of the
protocol processing, the checksum computation becomes one of the
limiting factors on TCP performance, for example. It is usually
appropriate to carefully hand-craft the checksum routine, exploiting
every machine-dependent trick possible; a fraction of a microsecond
per TCP data byte can add up to a significant CPU time savings
overall.
In outline, the Internet checksum algorithm is very simple:
(1) Adjacent octets to be checksummed are paired to form 16-bit
integers, and the 1's complement sum of these 16-bit integers is
formed.
(2) To generate a checksum, the checksum field itself is cleared,
the 16-bit 1's complement sum is computed over the octets
concerned, and the 1's complement of this sum is placed in the
checksum field.
(3) To check a checksum, the 1's complement sum is computed over the
same set of octets, including the checksum field. If the result
is all 1 bits (-0 in 1's complement arithmetic), the check
succeeds.
Suppose a checksum is to be computed over the sequence of octets
A, B, C, D, ... , Y, Z. Using the notation [a,b] for the 16-bit
integer a*256+b, where a and b are bytes, then the 16-bit 1's
complement sum of these bytes is given by one of the following:
[A,B] +' [C,D] +' ... +' [Y,Z] [1]
[A,B] +' [C,D] +' ... +' [Z,0] [2]
where +' indicates 1's complement addition. These cases
correspond to an even or odd count of bytes, respectively.
On a 2's complement machine, the 1's complement sum must be
computed by means of an "end around carry", i.e., any overflows
from the most significant bits are added into the least
significant bits. See the examples below.
Section 2 explores the properties of this checksum that may be
exploited to speed its calculation. Section 3 contains some
numerical examples of the most important implementation
techniques. Finally, Section 4 includes examples of specific
algorithms for a variety of common CPU types. We are grateful
to Van Jacobson and Charley Kline for their contribution of
algorithms to this section.
The properties of the Internet checksum were originally
discussed by Bill Plummer in IEN-45, entitled "Checksum Function
Design". Since IEN-45 has not been widely available, we include
it as an extended appendix to this RFC.
2. Calculating the Checksum
This simple checksum has a number of wonderful mathematical
properties that may be exploited to speed its calculation, as we
will now discuss.
(A) Commutative and Associative
As long as the even/odd assignment of bytes is respected, the
sum can be done in any order, and it can be arbitrarily split
into groups.
For example, the sum [1] could be split into:
( [A,B] +' [C,D] +' ... +' [J,0] )
+' ( [0,K] +' ... +' [Y,Z] ) [3]
(B) Byte Order Independence
The sum of 16-bit integers can be computed in either byte order.
Thus, if we calculate the swapped sum:
[B,A] +' [D,C] +' ... +' [Z,Y] [4]
the result is the same as [1], except the bytes are swapped in
the sum! To see why this is so, observe that in both orders the
carries are the same: from bit 15 to bit 0 and from bit 7 to bit
8. In other words, consistently swapping bytes simply rotates
the bits within the sum, but does not affect their internal
ordering.
Therefore, the sum may be calculated in exactly the same way
regardless of the byte order ("big-endian" or "little-endian")
of the underlaying hardware. For example, assume a "little-
endian" machine summing data that is stored in memory in network
("big-endian") order. Fetching each 16-bit word will swap
bytes, resulting in the sum [4]; however, storing the result
back into memory will swap the sum back into network byte order.
Byte swapping may also be used explicitly to handle boundary
alignment problems. For example, the second group in [3] can be
calculated without concern to its odd/even origin, as:
[K,L] +' ... +' [Z,0]
if this sum is byte-swapped before it is added to the first
group. See the example below.
(C) Parallel Summation
On machines that have word-sizes that are multiples of 16 bits,
it is possible to develop even more efficient implementations.
Because addition is associative, we do not have to sum the
integers in the order they appear in the message. Instead we
can add them in "parallel" by exploiting the larger word size.
To compute the checksum in parallel, simply do a 1's complement
addition of the message using the native word size of the
machine. For example, on a 32-bit machine we can add 4 bytes at
a time: [A,B,C,D]+'... When the sum has been computed, we "fold"
the long sum into 16 bits by adding the 16-bit segments. Each
16-bit addition may produce new end-around carries that must be
added.
Furthermore, again the byte order does not matter; we could
instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and
then swap the bytes of the final 16-bit sum as necessary. See
the examples below. Any permutation is allowed that collects
all the even-numbered data bytes into one sum byte and the odd-
numbered data bytes into the other sum byte.
There are further coding techniques that can be exploited to speed up
the checksum calculation.
(1) Deferred Carries
Depending upon the machine, it may be more efficient to defer
adding end-around carries until the main summation loop is
finished.
One approach is to sum 16-bit words in a 32-bit accumulator, so
the overflows build up in the high-order 16 bits. This approach
typically avoids a carry-sensing instruction but requires twice
as many additions as would adding 32-bit segments; which is
faster depends upon the detailed hardware architecture.
(2) Unwinding Loops
To reduce the loop overhead, it is often useful to "unwind" the
inner sum loop, replicating a series of addition commands within
one loop traversal. This technique often provides significant
savings, although it may complicate the logic of the program
considerably.
(3) Combine with Data Copying
Like checksumming, copying data from one memory location to
another involves per-byte overhead. In both cases, the
bottleneck is essentially the memory bus, i.e., how fast the
data can be fetched. On some machines (especially relatively
slow and simple micro-computers), overhead can be significantly
reduced by combining memory-to-memory copy and the checksumming,
fetching the data only once for both.
(4) Incremental Update
Finally, one can sometimes avoid recomputing the entire checksum
when one header field is updated. The best-known example is a
gateway changing the TTL field in the IP header, but there are
other examples (for example, when updating a source route). In
these cases it is possible to update the checksum without
scanning the message or datagram.
To update the checksum, simply add the differences of the
sixteen bit integers that have been changed. To see why this
works, observe that every 16-bit integer has an additive inverse
and that addition is associative. From this it follows that
given the original value m, the new value m', and the old
checksum C, the new checksum C' is:
C' = C + (-m) + m' = C + (m' - m)
点击浏览该文件
(iwgh) |