a review of ip packet compression techniques

7/28/2019 A Review of IP Packet Compression Techniques

1/6

A Review of IP Packet Compression Techniques

Ching Shen Tye and Dr. G. FairhurstElectronics Research Group, Department of Engineering,

Aberdeen University, Scotland, AB24 3UE.{c.tye, gorry}@erg.abdn.ac.uk

Abstract

This paper investigates several compression techniques may improve the IP packet delivery process

and identifies their limitations. The recent emergence and popularity of wireless Internet has triggered

a demand for improved transmission efficiency, especially where the link has a high cost per

transmitted byte. Packet compression at the link layer may speed up the delivery process and provide

more efficient use of available capacity. However, the success of compression depends on several

factors the protocol headers present, the use of encryption, and the type of data being sent.

1 INTRODUCTION

IP networking is replacing many existing networks,

e.g. IP telephony may replace a circuit switched

telephone network. However, IP introduces packet

overhead, and this raises the question of efficiency

(defined as the ratio between the total number of

information bytes and the total number of received

bytes). Efficiency is important were the cost of

transmission is high. Examples include fixed rate

links where the speed of transmission is limited or

wireless links were there is a cost for using the

radio bandwidth. One effect of a low efficiency isthat there is less capacity available for other

services. It also increases the transit delay of the

packets across the link (a larger packet takes

longer to serialise).

Furthermore, IP version 6 (IPv6) [1] is being

deployed in many next generation networks, and

especially in broadband wireless networks. This

increases the size of an IP address from 32 bits in

IPv4 [2] to 128 bits, resulting in a doubling of the

minimum IP header to 40 B in IPv6.

2 LINK COMPRESSION

One well known way to improve efficiency is use

data compression [3]. This process attempts to

yield a compact digital representation of the

information, and send this in place of the original

information.

When compression is used at the link layer, it can

improve transmission efficiency. There are two

implications: First, there is a computational cost

associated with algorithms for compression (at the

sender side of the link) and decompression (at thelink receiver). This may require additional

processor hardware, and may introduce extra delay.

In some cases this cost can be justified in terms of

the improved efficiency and reduced bandwidth. A

second drawback arises from the way in which

compression is performed. To decompress a

packet, the decompressor also needs to obtain

information about the way in which the

compression was performed, which we call this

the context [4]. For correct decompression, this

context information needs to be reliably sent by

the compressor to the decompressor.

In this paper, we assume that successfully

decompressed packets are semantically identical to

the original packets. In practice, this means that

the decompressed packets must exactly match the

original packets. This implies the use of lossless

data compression techniques.

3 COMPRESSION TECHNIQUES

3.1 Bulk Compression

The most common form of compression used in

computer software is bulk compression. In this

technique, all the information in the packets aretreated as a block of information and are

compressed using a compression algorithm. The

compressor constructs a dictionary of common

sequences within the information, and matches

each sequence to a shorter compressed

representation (key strings). Bulk compression

may use a pre-defined dictionary (static context

for all IP flows) or a running dictionary based on

the compression algorithm (i.e. optimised for a

particular IP flow). The receiver must use an

identical dictionary for decompression. This

method can achieve high compression ratios;however, it has two major drawbacks:

ISBN: 1-9025-6009-4 2003 PGNet


2/6

The dictionary requires a large memory. The dictionaries at the compressor and

decompressor must be synchronised. A

running dictionary system may loose

synchronisation (or context) when used

over a link subject to packet loss. A loss ofsynchronisation will cause the receiver to

discard all packets, until the receive

dictionary is re-synchronised.

To overcome the limitations on memory and link

quality, packet by packet dictionary [5]

algorithms were invented. This compresses each

packet individually, sending the context with the

packet. This prevents a loss of synchronisation.

This also requires a smaller dictionary (less

memory). The trade off is this achieves lower

compression ratios.

Another kind of compression technique is the

Guess-Table-Based compression [5] algorithm.

At the sender, this uses an algorithm to guess the

next byte(s) of data based on previous data. If the

guess is correct, the byte is not transmitted on the

link. Both receiver and transmitter must use the

same algorithm (context) to successfully

decompress data at the receiver.

4. HEADER COMPRESSION

Bulk compression achieves little benefit whenused on protocol header information. The structure

of this information varies from packet to packet

and from field to field within the packet headers.

Standard bulk compression algorithms can not

take advantage of this structure, however a

compression algorithm that understands the syntax

(and possibly semantics) of the packet headers

may be able to achieve considerable benefit by

exploiting the redundancy which is often present

in successive packets in the same IP flow. This is

called Header Compression.

4.1 Van Jacobson Header Compression (VJHC)

The Van Jacobson Header Compression scheme [6]

relies on knowledge of the TCP/IPv4 headers. The

algorithm first classifies packets into individual

flows (i.e. packets that share the same set of {IP

addresses, IP protocol type, and TCP port

numbers}). State (a context) is then created for

each flow and a Context ID (CID) assigned to

identify the flow at the compressor and

decompressor. The sender then omits fields in the

header that remain unchanged between successivepackets in an IP flow (these may be deduced by

using the CID in each packet to refer to the context

state at the decompressor).

VJHC compresses the IPv4 and TCP header

together as a combined set of fields. Figure 1

shows the TCP/IPv4 header fields. Within an IP

flow, more then half of the fields are unchanged

between successive packets. The total length

and identification field are expected to be

handled by link framing protocols. The IP

checksum can also re-calculate at the receiver. By

suppressing these fields at the compressor and

restoring them at the decompressor, VJHC can

significantly improve transmission efficiency for

the packet header.

Figure 1 TCP/IPv4s header behaviour

Furthermore, the remaining changing fields do not

frequently change at the same time, and the

compressed header can thus omit these in mostcases. The remaining fields usually change only by

a small amount in successive packets. A further

saving can be achieved by transmitting the

difference (i.e. differential encoding) in the value

of the field rather then the entire fields.

VJHC relies on two types of error detection: A

CRC at the link layer (to detect corruption of the

compressed packet) and the TCP checksum at the

transport layer (to detect corruption in the

compressed packet). When errors are detected, the

receiver discards the erroneous packet. Thiscreates another problem in the decompression

process. Since the differential compression

techniques are applied, the receiver also looses the

context state. The next following packet after the

discarded packet can not therefore be

decompressed correctly. It must also be discarded.

All subsequent packets will therefore be discarded

until the next synchronisation (i.e. uncompressed

packet is received, restoring the context state). To

overcome this error propagation, receiver should

use the differential sequence number change from

the incoming compressed packet to the sequence

number of the last correctly received packet, and


3/6

generate a correct sequence number for the packet

after discard packet.

Errors in the value of the TCP checksum errors of

packets received by the destination end host must

also be considered. When the end host fails to

receive TCP data segments (forward path), no

TCP acknowledgement is sent. The sender

eventually suffers a TCP timeout, and resends the

missing segment(s), these also trigger the

compressor to resynchronise the context state.

The combined IPv4 and TCP headers the two

header can typically be reduced using VJHC from

40B to 4B (i.e. 10% of the original size). The

technique significantly improves performance over

low speed (300 to 19,200bps) serial links.

The main disadvantages of VJHC are the impact

of loss of synchronisation (when not used with a

reliable link protocol) and the combined

compression of the TCP and IPv4 headers. The

scheme does not support recent changes to IP (e.g.

ECN, Diffserv), or TCP (e.g. SACK, ECN, TS

option, LFN). It also prevents the VJHC algorithm

from compressing packets with additional headers

placed between the IP and TCP headers (e.g.

IPSEC AH, or tunnel encapsulations). It also will

not compress either IPv6 and/or for other transport

protocols (such as UDP).

4.2 IP Header Compression (IPHC)

Internet Protocol Header Compression [7] is

another uses the same mechanism as VJHC by

omitting the unchanged fields in the header. The

main difference between IPHC and VJHC, is

IPHC compresses only the IP header. This

supports any transport protocol or tunnel

encapsulation also ECN and IPv6. IPHC also

allows extension to multicast, multi-access links.

Like VJHC, a 16 bit CID is used for TCP flows,

but a larger CID is assigned to non-TCP flows.

IPHC handles errors in a similar way to VJHC.

IPHC can also use other recovery methods to

recover TCP checksum failure (e.g. TWICE

algorithm [8]). For non-TCP packets, a periodical

uncompressed packet is sent to improve the

probability of the context information being

correct. This additional information reduces

efficiency, but helps bound the impact of packet

loss.

IPHC may reduce the IP header to 2 bytes for non-

TCP session and 4 bytes for TCP session. The

main advantage of IPHC is independence of the

transport layer protocols. The drawback is IPHC is

only half as efficient as VJHC for TCP packets.

5. ROBUSTHEADERCOMPRESSION (ROHC)

The RObust Header Compression scheme [4] is a

new header compression scheme, being developed

in the ROHC Working Group of the IETF.

Compared with the previous schemes (in section

4), the major advantages are high robustness and

improved efficiency.

A key feature is that the ROHC framework is

extensible. This means that new protocols can be

added without the need to design a completely

new compression protocol. The monolithic

approach of VJHC now seems inappropriate,

considering the widespread use of tunnelencapsulations, increasing use of security

protocols and the emergence of IPv6. The penalty

for flexibility is that ROHC is a complicated

technique, absorbing all the existing compression

techniques, and adding a more sophisticated

mechanism to achieve robustness and reliability.

Compression and Decompression are treated as

finite state machines and can be broken into a

series of states. At the compressor, these are

Initialization & Refresh (IR), First Order (FO) and

Second Order (SO). At the decompressor thesethree sates: No Context (NO), Static Context (SC)

and Full Context (FC).

The link starts with no context state for a flow

(CID), and therefore can only perform limited

compression. Once the compressor has

successfully installed context state at the receiver,

it may transits to the next state. (E.g. the

compressor will gradually transit forward from IR

FO SO when it gets sufficiently confident

that decompressor has all the information to

decompress the header).

ROHC defines three operation modes [4], they are:

Unidirectional (U-mode), Bidirectional Optimistic

(O-mode) and Bidirectional Reliable (R-mode).

All ROHC operations start from U-mode, and then

may transit to O-mode or R-mode depending on

the feedback information. U-mode makes a

periodic refresh and time out to update the context

(as in IPHC for UDP), O-mode uses the feedback

channel for error correction and R-mode utilises

all the available resources to prevent from anycontext loss or loss synchronisation.


4/6

Each operation mode contains all three states; IR,

FO and SO, as shown in the Figure 2 below. Any

errors that occur will force each state to return

back to a lower compression state or gradually to

lower the operation mode if more errors are

detected or propagated.

Figure 2 ROHC Operation modes

The ROHC basic packet format is shown in Figure

3; it consists of padding, feedback, header

information and payload. These four generic fields

allow the decompressor to generate feedback

information quickly and send this back to

compressor quickly.

Figure 3 ROHC basic packet structure

Padding: Padding is located at the beginning of

the compressed header; this aligns headers to byte

boundaries.

Feedback: Feedback information travels from the

de-compressor to compressor, to assist in

synchronisation of the context state.

Header : Compressed header information

Payload: Packet payload data

As earlier header compression techniques (section

4), ROHC still relies on a Link Layer CRC to

verify integrity of the compressed packet. To gain

the full benefit, ROHC relies on a feedback

acknowledgment channel for error discovery andrecovery. In some cases, a CRC also used protects

the entire segment such as the initialisation and

refresh header and ROHC fragmentation.

The main benefit of ROHC is that it may be

designed to compress any type of header. For

links that may experience loss, it may also be

made more robust than exiting schemes such as

VJHC and IPHC. Moreover it offer significant

benefits for small packets such as TCP ACKs or

VoIP packets when sent over a link with limited

capacity or low transmission speed.

6 EXPERIMENTALRESULTS

6.1 Packet by packet bulk compression

This section investigates the performance of

several bulk compression algorithms (section 3),

when use over the entire IP packet. 10,000

TCP/IPv4 packets were captured from an EthernetLAN and compressed by both the Huffman [3] and

Lempel-Ziv Welch (LZW) [9] algorithms. The

relationship between compression ratio and

original packet length for each compression

algorithm is shown in Figure 7(a), (b) respectively.

Figure 7(a) shows two curves corresponding to

two distinct types of packet payloads. The

compression ratio of the packets in the lower curve

are consistently lower than 1 throughout the entire

range of packet length, (i.e. these packets are

expanded in size by the compression algorithm,rather than being reduced). Packets in this class

include packets that contain random (or pre-

compressed) data, which can not be successfully

compressed by the link compressor using the

dictionary based compression techniques, such as

Huffman coding and LZW.

The upper curve in Figure 7(a) shows an increase

in the compression ratio with the packet length

increase, yielding a compression gain for packets

whose original length is larger than 550 B. In

Huffman coding, variable length packets arerepresented by a fixed size entry in the dictionary,

so the data is are smaller than the corresponding

code size at the beginning of dictionary, whereas

near the end of dictionary, the data size is much

bigger than the code size. When the packet size is

small, most data in a packet are represented by the

first entries in the dictionary, thus, the

compression ratio is less then 1. For larger packet

sizes, good compression ratios are achieved.

Similar result can be found for LZW coding in

Figure 7(b) however LZW maps blocks of variable

length into blocks of fixed size which allow it to

obtain better compression than Huffman coding.


5/6

Figure 8 plots the compression ratio against packet

size for 10,000 UDP/IPv4 packets compressed by

either the Huffman or LZW algorithms. The

compression ratio is below 1 through the entire

range of original packet size, which means no

compression gain is obtained. Further analysis

showed that most UDP traffic captured was

steaming data (e.g. audio or video) which is

already compressed by video/audio codec. Some

small (control) packets were compressible.

6.2 Comparison of techniques

Figure 4 presents a theoretical comparison of the

original packet length and compressed packet

length with several types of header. ROHC shows

considerable advantage and is constantly capable

of compression of all headers to less than 10 B.

0

10

20

30

40

50

60

7080

IPv4

IPv6

IPv4/TCP

IPv6/TCP

IPv4/UDP

IPv6/UDP

IPv4/UDP/RTP

IPv6/UDP/RTP

HeaderLengths(bytes)

Original

BULK

VJHC

IPHC

ROHC

Figure 4 Length of original packets and compressed

packet by several compression schemes

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

50

150

250

350

450

550

650

750

850

950

1

050

1

150

1

250

1

350

1

450

Original Packet Length (bytes)

CompressionRatio

BULK

VJHC

IPHC

ROHC

Figure 5 Comparison of compression ratio of several

compression schemes

The compression ratio of several compression

schemes are compared in Figure 5. VJHC, IPHC

and ROHC show a similar relationship between

compression ratio and original packets length.

When the packet length is less then 180 bytes

header compression schemes achieve a higher

compression ratio then bulk compression. In the

case of larger packets, bulk compression can

outperform the other compression schemes, but

only where data has not already been compressed.

The use of IPSEC encryption (ESP with encrypted

payload) is also an obstacle to compression. This

prevents use of VJHC, and forces ROHC to only

compress the outer (unencrypted) packet headers.

Bulk compression will also fail to compress an

encrypted payload. One mitigation, is to use a bulk

compression algorithm prior to applying the

IPSEC encryption.

6.3 Application performance

Header compression increases the efficiency of

information delivery, and for low speed

transmission may also reduce the packet transit

time. The benefit when using small packets (such

as VoIP or TCP ACKs) may be very noticeable.

0 20 40 60 80

FTP

Telnet

HTTP

Multimedia

Bit saving perce ntage (%)

Bit saving (%)

Figure 6 Bit rate saving base on application

To investigate the overall benefit, requires a study

of the range of packet sizes encountered for

different applications. 10,000 packets were

captured from an Ethernet LAN corresponding to

IP flows between several applications, and the

resulting data analysed to determine the

distribution of packet sizes.

Figure 6 shows the predicted bit rate saving of

header compression utilised by application. Since

Telnet sessions packets are usually small, they

offer good compression. HTTP, FTP and

multimedia packet usually consist of large packets,

thus significant efficiency savings cannot be

obtained by solely compressing protocol headers.

7 CONCLUSION

Bulk compression is useful when packets carry

large volumes of uncompressed data (especially

when the payload is larger than 500 bytes).

However, when the IP traffic uses encryption or

higher layer compression (e.g. ZIP files, IPCOMP,or multimedia CODECs) there is no benefit from


6/6

compression. In fact, attempted compression

simply wastes sender computing resources

Header compression is an effective way to reduce

the packet header size for small packets (less than

approximately 200 B) where compression can

provide a significant saving in overall packet size.

VJHC can only be applied on TCP/IPv4 packet

with no any options. IPHC can not achieve high

compression ratio by just compressing IP header,

since IP header is one of multiple headers used in

a packet. The existing schemes have a number of

weaknesses in the packet headers they can

compress, and their robustness to loss of packets.

In the future, it can be expected that header

compression techniques based on the ROHC

framework will be able to achieve a reasonablecompression ratio but without necessarily a

significant loss of robustness to packet loss. At

the moment, compression schemes for TCP using

ROHC have not been defined, and their behaviour

with loss is still to be examined. The practical

performance of ROHC is therefore an item of

further work.

REFERENCES

[1] S. Deering and R. Hinden, Internet

Protocol Version 6 (IPv6) Specification,

RFC 1883, 1995.[2] U. o. S. California and C. Marina del Rey,

Internet Protocol Specification, RFC

0791, 1981.

[3] D. A. Huffman, A Method for the

Construction of Minimum Redundancy

Codes, Proceedings of the IRE, 1952.

[4] C. Bormann, et al RObust Header

Compression (ROHC): Framework and

four profiles: RTP, UDP, ESP, and

uncompressed, RFC 3095, 2001.

[5] HP Company, HP Case Study: WAN

Link Compression on HP Routers, 1995.[6] V. Jacobson, Compression TCP/IP for

Low-Speed Serial Link,RFC 1144, 1990.

[7] M. Degermark, B. Nordgren, and S. Pink,

IP Header Compression, RFC 2507,

1999.

[8] M. Degermark, M. Engan, B. Nordgren,

and S. Pink, Low-loss TCP/IP header

compression for wireless networks, at

ACM MobiCom, 1996.

[9] T. A. Welch, A Technique for High-

Performance Data Compression,

Computer, pp. 8-18, 1984.

(a) Huffman Coding

(b) Lempel-Ziv Welch

Figure 7 Packet by packet compression on

TCP/IP packets

(a) Huffman Coding

(b) Lempel-Ziv Welch

Figure 8 Packet by packet compression on

UDP/IP packets

a review of ip packet compression techniques

Documents