织梦CMS - 轻松建站从此开始!

罗索

RFC1071: Computing the Internet Checksum

落鹤生 发布于 2008-05-06 19:07 点击:次 
ISI D.Borman Cray Research C. Partridge BBN Laboratories September 1988 Computing the Internet Checksum Status of This Memo This memo summarizes techniques and algorithms for efficiently comp
TAG:

Network Working Group                                         R.  Braden
Request for Comments: 1071                                           ISI
                                                              D.  Borman
                                                           Cray Research
                                                            C. Partridge
                                                        BBN Laboratories
                                                          September 1988


 


                    Computing the Internet Checksum


 


Status of This Memo


 

   This memo summarizes techniques and algorithms for efficiently
   computing the Internet checksum.  It is not a standard, but a set of
   useful implementation techniques.  Distribution of this memo is
   unlimited.


 

1.  Introduction


 

   This memo discusses methods for efficiently computing the Internet
   checksum that is used by the standard Internet protocols IP, UDP, and
   TCP.


 

   An efficient checksum implementation is critical to good performance.
   As advances in implementation techniques streamline the rest of the
   protocol processing, the checksum computation becomes one of the
   limiting factors on TCP performance, for example.  It is usually
   appropriate to carefully hand-craft the checksum routine, exploiting
   every machine-dependent trick possible; a fraction of a microsecond
   per TCP data byte can add up to a significant CPU time savings
   overall.


 

   In outline, the Internet checksum algorithm is very simple:


 

   (1)  Adjacent octets to be checksummed are paired to form 16-bit
        integers, and the 1's complement sum of these 16-bit integers is
        formed.


 

   (2)  To generate a checksum, the checksum field itself is cleared,
        the 16-bit 1's complement sum is computed over the octets
        concerned, and the 1's complement of this sum is placed in the
        checksum field.


 

   (3)  To check a checksum, the 1's complement sum is computed over the
        same set of octets, including the checksum field.  If the result
        is all 1 bits (-0 in 1's complement arithmetic), the check
        succeeds.


 

        Suppose a checksum is to be computed over the sequence of octets


 

        A, B, C, D, ... , Y, Z.  Using the notation [a,b] for the 16-bit
        integer a*256+b, where a and b are bytes, then the 16-bit 1's
        complement sum of these bytes is given by one of the following:


 

            [A,B] +' [C,D] +' ... +' [Y,Z]              [1]


 

            [A,B] +' [C,D] +' ... +' [Z,0]              [2]


 

        where +' indicates 1's complement addition. These cases
        correspond to an even or odd count of bytes, respectively.


 

        On a 2's complement machine, the 1's complement sum must be
        computed by means of an "end around carry", i.e., any overflows
        from the most significant bits are added into the least
        significant bits. See the examples below.


 

        Section 2 explores the properties of this checksum that may be
        exploited to speed its calculation.  Section 3 contains some
        numerical examples of the most important implementation
        techniques.  Finally, Section 4 includes examples of specific
        algorithms for a variety of common CPU types.  We are grateful
        to Van Jacobson and Charley Kline for their contribution of
        algorithms to this section.


 

        The properties of the Internet checksum were originally
        discussed by Bill Plummer in IEN-45, entitled "Checksum Function
        Design".  Since IEN-45 has not been widely available, we include
        it as an extended appendix to this RFC.


 

     2.  Calculating the Checksum


 

        This simple checksum has a number of wonderful mathematical
        properties that may be exploited to speed its calculation, as we
        will now discuss.


 


   (A)  Commutative and Associative


 

        As long as the even/odd assignment of bytes is respected, the
        sum can be done in any order, and it can be arbitrarily split
        into groups.


 

        For example, the sum [1] could be split into:


 

           ( [A,B] +' [C,D] +' ... +' [J,0] )


 

                  +' ( [0,K] +' ... +' [Y,Z] )               [3]


 

   (B)  Byte Order Independence


 

        The sum of 16-bit integers can be computed in either byte order.
        Thus, if we calculate the swapped sum:


 

           [B,A] +' [D,C] +' ... +' [Z,Y]                   [4]


 

        the result is the same as [1], except the bytes are swapped in
        the sum! To see why this is so, observe that in both orders the
        carries are the same: from bit 15 to bit 0 and from bit 7 to bit
        8.  In other words, consistently swapping bytes simply rotates
        the bits within the sum, but does not affect their internal
        ordering.


 

        Therefore, the sum may be calculated in exactly the same way
        regardless of the byte order ("big-endian" or "little-endian")
        of the underlaying hardware.  For example, assume a "little-
        endian" machine summing data that is stored in memory in network
        ("big-endian") order.  Fetching each 16-bit word will swap
        bytes, resulting in the sum [4]; however, storing the result
        back into memory will swap the sum back into network byte order.


 

        Byte swapping may also be used explicitly to handle boundary
        alignment problems.  For example, the second group in [3] can be
        calculated without concern to its odd/even origin, as:


 

              [K,L] +' ... +' [Z,0]


 

        if this sum is byte-swapped before it is added to the first
        group.  See the example below.


 

   (C)  Parallel Summation


 

        On machines that have word-sizes that are multiples of 16 bits,
        it is possible to develop even more efficient implementations.
        Because addition is associative, we do not have to sum the
        integers in the order they appear in the message.  Instead we
        can add them in "parallel" by exploiting the larger word size.


 

        To compute the checksum in parallel, simply do a 1's complement
        addition of the message using the native word size of the
        machine.  For example, on a 32-bit machine we can add 4 bytes at
        a time: [A,B,C,D]+'... When the sum has been computed, we "fold"
        the long sum into 16 bits by adding the 16-bit segments.  Each
        16-bit addition may produce new end-around carries that must be
        added.


 

        Furthermore, again the byte order does not matter; we could
        instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and
        then swap the bytes of the final 16-bit sum as necessary.  See
        the examples below.  Any permutation is allowed that collects
        all the even-numbered data bytes into one sum byte and the odd-
        numbered data bytes into the other sum byte.


 


   There are further coding techniques that can be exploited to speed up
   the checksum calculation.


 

   (1)  Deferred Carries


 

        Depending upon the machine, it may be more efficient to defer
        adding end-around carries until the main summation loop is
        finished.


 

        One approach is to sum 16-bit words in a 32-bit accumulator, so
        the overflows build up in the high-order 16 bits.  This approach
        typically avoids a carry-sensing instruction but requires twice
        as many additions as would adding 32-bit segments; which is
        faster depends upon the detailed hardware architecture.


 

   (2)  Unwinding Loops


 

        To reduce the loop overhead, it is often useful to "unwind" the
        inner sum loop, replicating a series of addition commands within
        one loop traversal.  This technique often provides significant
        savings, although it may complicate the logic of the program
        considerably.


 

   (3)  Combine with Data Copying


 

        Like checksumming, copying data from one memory location to
        another involves per-byte overhead.  In both cases, the
        bottleneck is essentially the memory bus, i.e., how fast the
        data can be fetched. On some machines (especially relatively
        slow and simple micro-computers), overhead can be significantly
        reduced by combining memory-to-memory copy and the checksumming,
        fetching the data only once for both.


 

   (4)  Incremental Update


 

        Finally, one can sometimes avoid recomputing the entire checksum
        when one header field is updated.  The best-known example is a
        gateway changing the TTL field in the IP header, but there are
        other examples (for example, when updating a source route).  In
        these cases it is possible to update the checksum without
        scanning the message or datagram.


 

        To update the checksum, simply add the differences of the
        sixteen bit integers that have been changed.  To see why this
        works, observe that every 16-bit integer has an additive inverse
        and that addition is associative.  From this it follows that
        given the original value m, the new value m', and the old
        checksum C, the new checksum C' is:


 

                C' = C + (-m) + m' = C + (m' - m)

 点击浏览该文件

(iwgh)
本站文章除注明转载外,均为本站原创或编译欢迎任何形式的转载,但请务必注明出处,尊重他人劳动,同学习共成长。转载请注明:文章转载自:罗索实验室 [http://www.rosoo.net/a/200805/5127.html]
本文出处: 作者:iwgh
顶一下
(0)
0%
踩一下
(0)
0%
------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 验证码:点击我更换图片
栏目列表
将本文分享到微信
织梦二维码生成器
推荐内容