a review of ip packet compression techniques

Upload: icarus-jundi

Post on 03-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 A Review of IP Packet Compression Techniques

    1/6

    A Review of IP Packet Compression Techniques

    Ching Shen Tye and Dr. G. FairhurstElectronics Research Group, Department of Engineering,

    Aberdeen University, Scotland, AB24 3UE.{c.tye, gorry}@erg.abdn.ac.uk

    Abstract

    This paper investigates several compression techniques may improve the IP packet delivery process

    and identifies their limitations. The recent emergence and popularity of wireless Internet has triggered

    a demand for improved transmission efficiency, especially where the link has a high cost per

    transmitted byte. Packet compression at the link layer may speed up the delivery process and provide

    more efficient use of available capacity. However, the success of compression depends on several

    factors the protocol headers present, the use of encryption, and the type of data being sent.

    1 INTRODUCTION

    IP networking is replacing many existing networks,

    e.g. IP telephony may replace a circuit switched

    telephone network. However, IP introduces packet

    overhead, and this raises the question of efficiency

    (defined as the ratio between the total number of

    information bytes and the total number of received

    bytes). Efficiency is important were the cost of

    transmission is high. Examples include fixed rate

    links where the speed of transmission is limited or

    wireless links were there is a cost for using the

    radio bandwidth. One effect of a low efficiency isthat there is less capacity available for other

    services. It also increases the transit delay of the

    packets across the link (a larger packet takes

    longer to serialise).

    Furthermore, IP version 6 (IPv6) [1] is being

    deployed in many next generation networks, and

    especially in broadband wireless networks. This

    increases the size of an IP address from 32 bits in

    IPv4 [2] to 128 bits, resulting in a doubling of the

    minimum IP header to 40 B in IPv6.

    2 LINK COMPRESSION

    One well known way to improve efficiency is use

    data compression [3]. This process attempts to

    yield a compact digital representation of the

    information, and send this in place of the original

    information.

    When compression is used at the link layer, it can

    improve transmission efficiency. There are two

    implications: First, there is a computational cost

    associated with algorithms for compression (at the

    sender side of the link) and decompression (at thelink receiver). This may require additional

    processor hardware, and may introduce extra delay.

    In some cases this cost can be justified in terms of

    the improved efficiency and reduced bandwidth. A

    second drawback arises from the way in which

    compression is performed. To decompress a

    packet, the decompressor also needs to obtain

    information about the way in which the

    compression was performed, which we call this

    the context [4]. For correct decompression, this

    context information needs to be reliably sent by

    the compressor to the decompressor.

    In this paper, we assume that successfully

    decompressed packets are semantically identical to

    the original packets. In practice, this means that

    the decompressed packets must exactly match the

    original packets. This implies the use of lossless

    data compression techniques.

    3 COMPRESSION TECHNIQUES

    3.1 Bulk Compression

    The most common form of compression used in

    computer software is bulk compression. In this

    technique, all the information in the packets aretreated as a block of information and are

    compressed using a compression algorithm. The

    compressor constructs a dictionary of common

    sequences within the information, and matches

    each sequence to a shorter compressed

    representation (key strings). Bulk compression

    may use a pre-defined dictionary (static context

    for all IP flows) or a running dictionary based on

    the compression algorithm (i.e. optimised for a

    particular IP flow). The receiver must use an

    identical dictionary for decompression. This

    method can achieve high compression ratios;however, it has two major drawbacks:

    ISBN: 1-9025-6009-4 2003 PGNet

  • 7/28/2019 A Review of IP Packet Compression Techniques

    2/6

    The dictionary requires a large memory. The dictionaries at the compressor and

    decompressor must be synchronised. A

    running dictionary system may loose

    synchronisation (or context) when used

    over a link subject to packet loss. A loss ofsynchronisation will cause the receiver to

    discard all packets, until the receive

    dictionary is re-synchronised.

    To overcome the limitations on memory and link

    quality, packet by packet dictionary [5]

    algorithms were invented. This compresses each

    packet individually, sending the context with the

    packet. This prevents a loss of synchronisation.

    This also requires a smaller dictionary (less

    memory). The trade off is this achieves lower

    compression ratios.

    Another kind of compression technique is the

    Guess-Table-Based compression [5] algorithm.

    At the sender, this uses an algorithm to guess the

    next byte(s) of data based on previous data. If the

    guess is correct, the byte is not transmitted on the

    link. Both receiver and transmitter must use the

    same algorithm (context) to successfully

    decompress data at the receiver.

    4. HEADER COMPRESSION

    Bulk compression achieves little benefit whenused on protocol header information. The structure

    of this information varies from packet to packet

    and from field to field within the packet headers.

    Standard bulk compression algorithms can not

    take advantage of this structure, however a

    compression algorithm that understands the syntax

    (and possibly semantics) of the packet headers

    may be able to achieve considerable benefit by

    exploiting the redundancy which is often present

    in successive packets in the same IP flow. This is

    called Header Compression.

    4.1 Van Jacobson Header Compression (VJHC)

    The Van Jacobson Header Compression scheme [6]

    relies on knowledge of the TCP/IPv4 headers. The

    algorithm first classifies packets into individual

    flows (i.e. packets that share the same set of {IP

    addresses, IP protocol type, and TCP port

    numbers}). State (a context) is then created for

    each flow and a Context ID (CID) assigned to

    identify the flow at the compressor and

    decompressor. The sender then omits fields in the

    header that remain unchanged between successivepackets in an IP flow (these may be deduced by

    using the CID in each packet to refer to the context

    state at the decompressor).

    VJHC compresses the IPv4 and TCP header

    together as a combined set of fields. Figure 1

    shows the TCP/IPv4 header fields. Within an IP

    flow, more then half of the fields are unchanged

    between successive packets. The total length

    and identification field are expected to be

    handled by link framing protocols. The IP

    checksum can also re-calculate at the receiver. By

    suppressing these fields at the compressor and

    restoring them at the decompressor, VJHC can

    significantly improve transmission efficiency for

    the packet header.

    Figure 1 TCP/IPv4s header behaviour

    Furthermore, the remaining changing fields do not

    frequently change at the same time, and the

    compressed header can thus omit these in mostcases. The remaining fields usually change only by

    a small amount in successive packets. A further

    saving can be achieved by transmitting the

    difference (i.e. differential encoding) in the value

    of the field rather then the entire fields.

    VJHC relies on two types of error detection: A

    CRC at the link layer (to detect corruption of the

    compressed packet) and the TCP checksum at the

    transport layer (to detect corruption in the

    compressed packet). When errors are detected, the

    receiver discards the erroneous packet. Thiscreates another problem in the decompression

    process. Since the differential compression

    techniques are applied, the receiver also looses the

    context state. The next following packet after the

    discarded packet can not therefore be

    decompressed correctly. It must also be discarded.

    All subsequent packets will therefore be discarded

    until the next synchronisation (i.e. uncompressed

    packet is received, restoring the context state). To

    overcome this error propagation, receiver should

    use the differential sequence number change from

    the incoming compressed packet to the sequence

    number of the last correctly received packet, and

  • 7/28/2019 A Review of IP Packet Compression Techniques

    3/6

    generate a correct sequence number for the packet

    after discard packet.

    Errors in the value of the TCP checksum errors of

    packets received by the destination end host must

    also be considered. When the end host fails to

    receive TCP data segments (forward path), no

    TCP acknowledgement is sent. The sender

    eventually suffers a TCP timeout, and resends the

    missing segment(s), these also trigger the

    compressor to resynchronise the context state.

    The combined IPv4 and TCP headers the two

    header can typically be reduced using VJHC from

    40B to 4B (i.e. 10% of the original size). The

    technique significantly improves performance over

    low speed (300 to 19,200bps) serial links.

    The main disadvantages of VJHC are the impact

    of loss of synchronisation (when not used with a

    reliable link protocol) and the combined

    compression of the TCP and IPv4 headers. The

    scheme does not support recent changes to IP (e.g.

    ECN, Diffserv), or TCP (e.g. SACK, ECN, TS

    option, LFN). It also prevents the VJHC algorithm

    from compressing packets with additional headers

    placed between the IP and TCP headers (e.g.

    IPSEC AH, or tunnel encapsulations). It also will

    not compress either IPv6 and/or for other transport

    protocols (such as UDP).

    4.2 IP Header Compression (IPHC)

    Internet Protocol Header Compression [7] is

    another uses the same mechanism as VJHC by

    omitting the unchanged fields in the header. The

    main difference between IPHC and VJHC, is

    IPHC compresses only the IP header. This

    supports any transport protocol or tunnel

    encapsulation also ECN and IPv6. IPHC also

    allows extension to multicast, multi-access links.

    Like VJHC, a 16 bit CID is used for TCP flows,

    but a larger CID is assigned to non-TCP flows.

    IPHC handles errors in a similar way to VJHC.

    IPHC can also use other recovery methods to

    recover TCP checksum failure (e.g. TWICE

    algorithm [8]). For non-TCP packets, a periodical

    uncompressed packet is sent to improve the

    probability of the context information being

    correct. This additional information reduces

    efficiency, but helps bound the impact of packet

    loss.

    IPHC may reduce the IP header to 2 bytes for non-

    TCP session and 4 bytes for TCP session. The

    main advantage of IPHC is independence of the

    transport layer protocols. The drawback is IPHC is

    only half as efficient as VJHC for TCP packets.

    5. ROBUSTHEADERCOMPRESSION (ROHC)

    The RObust Header Compression scheme [4] is a

    new header compression scheme, being developed

    in the ROHC Working Group of the IETF.

    Compared with the previous schemes (in section

    4), the major advantages are high robustness and

    improved efficiency.

    A key feature is that the ROHC framework is

    extensible. This means that new protocols can be

    added without the need to design a completely

    new compression protocol. The monolithic

    approach of VJHC now seems inappropriate,

    considering the widespread use of tunnelencapsulations, increasing use of security

    protocols and the emergence of IPv6. The penalty

    for flexibility is that ROHC is a complicated

    technique, absorbing all the existing compression

    techniques, and adding a more sophisticated

    mechanism to achieve robustness and reliability.

    Compression and Decompression are treated as

    finite state machines and can be broken into a

    series of states. At the compressor, these are

    Initialization & Refresh (IR), First Order (FO) and

    Second Order (SO). At the decompressor thesethree sates: No Context (NO), Static Context (SC)

    and Full Context (FC).

    The link starts with no context state for a flow

    (CID), and therefore can only perform limited

    compression. Once the compressor has

    successfully installed context state at the receiver,

    it may transits to the next state. (E.g. the

    compressor will gradually transit forward from IR

    FO SO when it gets sufficiently confident

    that decompressor has all the information to

    decompress the header).

    ROHC defines three operation modes [4], they are:

    Unidirectional (U-mode), Bidirectional Optimistic

    (O-mode) and Bidirectional Reliable (R-mode).

    All ROHC operations start from U-mode, and then

    may transit to O-mode or R-mode depending on

    the feedback information. U-mode makes a

    periodic refresh and time out to update the context

    (as in IPHC for UDP), O-mode uses the feedback

    channel for error correction and R-mode utilises

    all the available resources to prevent from anycontext loss or loss synchronisation.

  • 7/28/2019 A Review of IP Packet Compression Techniques

    4/6

    Each operation mode contains all three states; IR,

    FO and SO, as shown in the Figure 2 below. Any

    errors that occur will force each state to return

    back to a lower compression state or gradually to

    lower the operation mode if more errors are

    detected or propagated.

    Figure 2 ROHC Operation modes

    The ROHC basic packet format is shown in Figure

    3; it consists of padding, feedback, header

    information and payload. These four generic fields

    allow the decompressor to generate feedback

    information quickly and send this back to

    compressor quickly.

    Figure 3 ROHC basic packet structure

    Padding: Padding is located at the beginning of

    the compressed header; this aligns headers to byte

    boundaries.

    Feedback: Feedback information travels from the

    de-compressor to compressor, to assist in

    synchronisation of the context state.

    Header : Compressed header information

    Payload: Packet payload data

    As earlier header compression techniques (section

    4), ROHC still relies on a Link Layer CRC to

    verify integrity of the compressed packet. To gain

    the full benefit, ROHC relies on a feedback

    acknowledgment channel for error discovery andrecovery. In some cases, a CRC also used protects

    the entire segment such as the initialisation and

    refresh header and ROHC fragmentation.

    The main benefit of ROHC is that it may be

    designed to compress any type of header. For

    links that may experience loss, it may also be

    made more robust than exiting schemes such as

    VJHC and IPHC. Moreover it offer significant

    benefits for small packets such as TCP ACKs or

    VoIP packets when sent over a link with limited

    capacity or low transmission speed.

    6 EXPERIMENTALRESULTS

    6.1 Packet by packet bulk compression

    This section investigates the performance of

    several bulk compression algorithms (section 3),

    when use over the entire IP packet. 10,000

    TCP/IPv4 packets were captured from an EthernetLAN and compressed by both the Huffman [3] and

    Lempel-Ziv Welch (LZW) [9] algorithms. The

    relationship between compression ratio and

    original packet length for each compression

    algorithm is shown in Figure 7(a), (b) respectively.

    Figure 7(a) shows two curves corresponding to

    two distinct types of packet payloads. The

    compression ratio of the packets in the lower curve

    are consistently lower than 1 throughout the entire

    range of packet length, (i.e. these packets are

    expanded in size by the compression algorithm,rather than being reduced). Packets in this class

    include packets that contain random (or pre-

    compressed) data, which can not be successfully

    compressed by the link compressor using the

    dictionary based compression techniques, such as

    Huffman coding and LZW.

    The upper curve in Figure 7(a) shows an increase

    in the compression ratio with the packet length

    increase, yielding a compression gain for packets

    whose original length is larger than 550 B. In

    Huffman coding, variable length packets arerepresented by a fixed size entry in the dictionary,

    so the data is are smaller than the corresponding

    code size at the beginning of dictionary, whereas

    near the end of dictionary, the data size is much

    bigger than the code size. When the packet size is

    small, most data in a packet are represented by the

    first entries in the dictionary, thus, the

    compression ratio is less then 1. For larger packet

    sizes, good compression ratios are achieved.

    Similar result can be found for LZW coding in

    Figure 7(b) however LZW maps blocks of variable

    length into blocks of fixed size which allow it to

    obtain better compression than Huffman coding.

  • 7/28/2019 A Review of IP Packet Compression Techniques

    5/6

    Figure 8 plots the compression ratio against packet

    size for 10,000 UDP/IPv4 packets compressed by

    either the Huffman or LZW algorithms. The

    compression ratio is below 1 through the entire

    range of original packet size, which means no

    compression gain is obtained. Further analysis

    showed that most UDP traffic captured was

    steaming data (e.g. audio or video) which is

    already compressed by video/audio codec. Some

    small (control) packets were compressible.

    6.2 Comparison of techniques

    Figure 4 presents a theoretical comparison of the

    original packet length and compressed packet

    length with several types of header. ROHC shows

    considerable advantage and is constantly capable

    of compression of all headers to less than 10 B.

    0

    10

    20

    30

    40

    50

    60

    7080

    IPv4

    IPv6

    IPv4/TCP

    IPv6/TCP

    IPv4/UDP

    IPv6/UDP

    IPv4/UDP/RTP

    IPv6/UDP/RTP

    HeaderLengths(bytes)

    Original

    BULK

    VJHC

    IPHC

    ROHC

    Figure 4 Length of original packets and compressed

    packet by several compression schemes

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    1.5

    1.6

    50

    150

    250

    350

    450

    550

    650

    750

    850

    950

    1

    050

    1

    150

    1

    250

    1

    350

    1

    450

    Original Packet Length (bytes)

    CompressionRatio

    BULK

    VJHC

    IPHC

    ROHC

    Figure 5 Comparison of compression ratio of several

    compression schemes

    The compression ratio of several compression

    schemes are compared in Figure 5. VJHC, IPHC

    and ROHC show a similar relationship between

    compression ratio and original packets length.

    When the packet length is less then 180 bytes

    header compression schemes achieve a higher

    compression ratio then bulk compression. In the

    case of larger packets, bulk compression can

    outperform the other compression schemes, but

    only where data has not already been compressed.

    The use of IPSEC encryption (ESP with encrypted

    payload) is also an obstacle to compression. This

    prevents use of VJHC, and forces ROHC to only

    compress the outer (unencrypted) packet headers.

    Bulk compression will also fail to compress an

    encrypted payload. One mitigation, is to use a bulk

    compression algorithm prior to applying the

    IPSEC encryption.

    6.3 Application performance

    Header compression increases the efficiency of

    information delivery, and for low speed

    transmission may also reduce the packet transit

    time. The benefit when using small packets (such

    as VoIP or TCP ACKs) may be very noticeable.

    0 20 40 60 80

    FTP

    Telnet

    HTTP

    Multimedia

    Bit saving perce ntage (%)

    Bit saving (%)

    Figure 6 Bit rate saving base on application

    To investigate the overall benefit, requires a study

    of the range of packet sizes encountered for

    different applications. 10,000 packets were

    captured from an Ethernet LAN corresponding to

    IP flows between several applications, and the

    resulting data analysed to determine the

    distribution of packet sizes.

    Figure 6 shows the predicted bit rate saving of

    header compression utilised by application. Since

    Telnet sessions packets are usually small, they

    offer good compression. HTTP, FTP and

    multimedia packet usually consist of large packets,

    thus significant efficiency savings cannot be

    obtained by solely compressing protocol headers.

    7 CONCLUSION

    Bulk compression is useful when packets carry

    large volumes of uncompressed data (especially

    when the payload is larger than 500 bytes).

    However, when the IP traffic uses encryption or

    higher layer compression (e.g. ZIP files, IPCOMP,or multimedia CODECs) there is no benefit from

  • 7/28/2019 A Review of IP Packet Compression Techniques

    6/6

    compression. In fact, attempted compression

    simply wastes sender computing resources

    Header compression is an effective way to reduce

    the packet header size for small packets (less than

    approximately 200 B) where compression can

    provide a significant saving in overall packet size.

    VJHC can only be applied on TCP/IPv4 packet

    with no any options. IPHC can not achieve high

    compression ratio by just compressing IP header,

    since IP header is one of multiple headers used in

    a packet. The existing schemes have a number of

    weaknesses in the packet headers they can

    compress, and their robustness to loss of packets.

    In the future, it can be expected that header

    compression techniques based on the ROHC

    framework will be able to achieve a reasonablecompression ratio but without necessarily a

    significant loss of robustness to packet loss. At

    the moment, compression schemes for TCP using

    ROHC have not been defined, and their behaviour

    with loss is still to be examined. The practical

    performance of ROHC is therefore an item of

    further work.

    REFERENCES

    [1] S. Deering and R. Hinden, Internet

    Protocol Version 6 (IPv6) Specification,

    RFC 1883, 1995.[2] U. o. S. California and C. Marina del Rey,

    Internet Protocol Specification, RFC

    0791, 1981.

    [3] D. A. Huffman, A Method for the

    Construction of Minimum Redundancy

    Codes, Proceedings of the IRE, 1952.

    [4] C. Bormann, et al RObust Header

    Compression (ROHC): Framework and

    four profiles: RTP, UDP, ESP, and

    uncompressed, RFC 3095, 2001.

    [5] HP Company, HP Case Study: WAN

    Link Compression on HP Routers, 1995.[6] V. Jacobson, Compression TCP/IP for

    Low-Speed Serial Link,RFC 1144, 1990.

    [7] M. Degermark, B. Nordgren, and S. Pink,

    IP Header Compression, RFC 2507,

    1999.

    [8] M. Degermark, M. Engan, B. Nordgren,

    and S. Pink, Low-loss TCP/IP header

    compression for wireless networks, at

    ACM MobiCom, 1996.

    [9] T. A. Welch, A Technique for High-

    Performance Data Compression,

    Computer, pp. 8-18, 1984.

    (a) Huffman Coding

    (b) Lempel-Ziv Welch

    Figure 7 Packet by packet compression on

    TCP/IP packets

    (a) Huffman Coding

    (b) Lempel-Ziv Welch

    Figure 8 Packet by packet compression on

    UDP/IP packets