a review of ip packet compression techniques
TRANSCRIPT
-
7/28/2019 A Review of IP Packet Compression Techniques
1/6
A Review of IP Packet Compression Techniques
Ching Shen Tye and Dr. G. FairhurstElectronics Research Group, Department of Engineering,
Aberdeen University, Scotland, AB24 3UE.{c.tye, gorry}@erg.abdn.ac.uk
Abstract
This paper investigates several compression techniques may improve the IP packet delivery process
and identifies their limitations. The recent emergence and popularity of wireless Internet has triggered
a demand for improved transmission efficiency, especially where the link has a high cost per
transmitted byte. Packet compression at the link layer may speed up the delivery process and provide
more efficient use of available capacity. However, the success of compression depends on several
factors the protocol headers present, the use of encryption, and the type of data being sent.
1 INTRODUCTION
IP networking is replacing many existing networks,
e.g. IP telephony may replace a circuit switched
telephone network. However, IP introduces packet
overhead, and this raises the question of efficiency
(defined as the ratio between the total number of
information bytes and the total number of received
bytes). Efficiency is important were the cost of
transmission is high. Examples include fixed rate
links where the speed of transmission is limited or
wireless links were there is a cost for using the
radio bandwidth. One effect of a low efficiency isthat there is less capacity available for other
services. It also increases the transit delay of the
packets across the link (a larger packet takes
longer to serialise).
Furthermore, IP version 6 (IPv6) [1] is being
deployed in many next generation networks, and
especially in broadband wireless networks. This
increases the size of an IP address from 32 bits in
IPv4 [2] to 128 bits, resulting in a doubling of the
minimum IP header to 40 B in IPv6.
2 LINK COMPRESSION
One well known way to improve efficiency is use
data compression [3]. This process attempts to
yield a compact digital representation of the
information, and send this in place of the original
information.
When compression is used at the link layer, it can
improve transmission efficiency. There are two
implications: First, there is a computational cost
associated with algorithms for compression (at the
sender side of the link) and decompression (at thelink receiver). This may require additional
processor hardware, and may introduce extra delay.
In some cases this cost can be justified in terms of
the improved efficiency and reduced bandwidth. A
second drawback arises from the way in which
compression is performed. To decompress a
packet, the decompressor also needs to obtain
information about the way in which the
compression was performed, which we call this
the context [4]. For correct decompression, this
context information needs to be reliably sent by
the compressor to the decompressor.
In this paper, we assume that successfully
decompressed packets are semantically identical to
the original packets. In practice, this means that
the decompressed packets must exactly match the
original packets. This implies the use of lossless
data compression techniques.
3 COMPRESSION TECHNIQUES
3.1 Bulk Compression
The most common form of compression used in
computer software is bulk compression. In this
technique, all the information in the packets aretreated as a block of information and are
compressed using a compression algorithm. The
compressor constructs a dictionary of common
sequences within the information, and matches
each sequence to a shorter compressed
representation (key strings). Bulk compression
may use a pre-defined dictionary (static context
for all IP flows) or a running dictionary based on
the compression algorithm (i.e. optimised for a
particular IP flow). The receiver must use an
identical dictionary for decompression. This
method can achieve high compression ratios;however, it has two major drawbacks:
ISBN: 1-9025-6009-4 2003 PGNet
-
7/28/2019 A Review of IP Packet Compression Techniques
2/6
The dictionary requires a large memory. The dictionaries at the compressor and
decompressor must be synchronised. A
running dictionary system may loose
synchronisation (or context) when used
over a link subject to packet loss. A loss ofsynchronisation will cause the receiver to
discard all packets, until the receive
dictionary is re-synchronised.
To overcome the limitations on memory and link
quality, packet by packet dictionary [5]
algorithms were invented. This compresses each
packet individually, sending the context with the
packet. This prevents a loss of synchronisation.
This also requires a smaller dictionary (less
memory). The trade off is this achieves lower
compression ratios.
Another kind of compression technique is the
Guess-Table-Based compression [5] algorithm.
At the sender, this uses an algorithm to guess the
next byte(s) of data based on previous data. If the
guess is correct, the byte is not transmitted on the
link. Both receiver and transmitter must use the
same algorithm (context) to successfully
decompress data at the receiver.
4. HEADER COMPRESSION
Bulk compression achieves little benefit whenused on protocol header information. The structure
of this information varies from packet to packet
and from field to field within the packet headers.
Standard bulk compression algorithms can not
take advantage of this structure, however a
compression algorithm that understands the syntax
(and possibly semantics) of the packet headers
may be able to achieve considerable benefit by
exploiting the redundancy which is often present
in successive packets in the same IP flow. This is
called Header Compression.
4.1 Van Jacobson Header Compression (VJHC)
The Van Jacobson Header Compression scheme [6]
relies on knowledge of the TCP/IPv4 headers. The
algorithm first classifies packets into individual
flows (i.e. packets that share the same set of {IP
addresses, IP protocol type, and TCP port
numbers}). State (a context) is then created for
each flow and a Context ID (CID) assigned to
identify the flow at the compressor and
decompressor. The sender then omits fields in the
header that remain unchanged between successivepackets in an IP flow (these may be deduced by
using the CID in each packet to refer to the context
state at the decompressor).
VJHC compresses the IPv4 and TCP header
together as a combined set of fields. Figure 1
shows the TCP/IPv4 header fields. Within an IP
flow, more then half of the fields are unchanged
between successive packets. The total length
and identification field are expected to be
handled by link framing protocols. The IP
checksum can also re-calculate at the receiver. By
suppressing these fields at the compressor and
restoring them at the decompressor, VJHC can
significantly improve transmission efficiency for
the packet header.
Figure 1 TCP/IPv4s header behaviour
Furthermore, the remaining changing fields do not
frequently change at the same time, and the
compressed header can thus omit these in mostcases. The remaining fields usually change only by
a small amount in successive packets. A further
saving can be achieved by transmitting the
difference (i.e. differential encoding) in the value
of the field rather then the entire fields.
VJHC relies on two types of error detection: A
CRC at the link layer (to detect corruption of the
compressed packet) and the TCP checksum at the
transport layer (to detect corruption in the
compressed packet). When errors are detected, the
receiver discards the erroneous packet. Thiscreates another problem in the decompression
process. Since the differential compression
techniques are applied, the receiver also looses the
context state. The next following packet after the
discarded packet can not therefore be
decompressed correctly. It must also be discarded.
All subsequent packets will therefore be discarded
until the next synchronisation (i.e. uncompressed
packet is received, restoring the context state). To
overcome this error propagation, receiver should
use the differential sequence number change from
the incoming compressed packet to the sequence
number of the last correctly received packet, and
-
7/28/2019 A Review of IP Packet Compression Techniques
3/6
generate a correct sequence number for the packet
after discard packet.
Errors in the value of the TCP checksum errors of
packets received by the destination end host must
also be considered. When the end host fails to
receive TCP data segments (forward path), no
TCP acknowledgement is sent. The sender
eventually suffers a TCP timeout, and resends the
missing segment(s), these also trigger the
compressor to resynchronise the context state.
The combined IPv4 and TCP headers the two
header can typically be reduced using VJHC from
40B to 4B (i.e. 10% of the original size). The
technique significantly improves performance over
low speed (300 to 19,200bps) serial links.
The main disadvantages of VJHC are the impact
of loss of synchronisation (when not used with a
reliable link protocol) and the combined
compression of the TCP and IPv4 headers. The
scheme does not support recent changes to IP (e.g.
ECN, Diffserv), or TCP (e.g. SACK, ECN, TS
option, LFN). It also prevents the VJHC algorithm
from compressing packets with additional headers
placed between the IP and TCP headers (e.g.
IPSEC AH, or tunnel encapsulations). It also will
not compress either IPv6 and/or for other transport
protocols (such as UDP).
4.2 IP Header Compression (IPHC)
Internet Protocol Header Compression [7] is
another uses the same mechanism as VJHC by
omitting the unchanged fields in the header. The
main difference between IPHC and VJHC, is
IPHC compresses only the IP header. This
supports any transport protocol or tunnel
encapsulation also ECN and IPv6. IPHC also
allows extension to multicast, multi-access links.
Like VJHC, a 16 bit CID is used for TCP flows,
but a larger CID is assigned to non-TCP flows.
IPHC handles errors in a similar way to VJHC.
IPHC can also use other recovery methods to
recover TCP checksum failure (e.g. TWICE
algorithm [8]). For non-TCP packets, a periodical
uncompressed packet is sent to improve the
probability of the context information being
correct. This additional information reduces
efficiency, but helps bound the impact of packet
loss.
IPHC may reduce the IP header to 2 bytes for non-
TCP session and 4 bytes for TCP session. The
main advantage of IPHC is independence of the
transport layer protocols. The drawback is IPHC is
only half as efficient as VJHC for TCP packets.
5. ROBUSTHEADERCOMPRESSION (ROHC)
The RObust Header Compression scheme [4] is a
new header compression scheme, being developed
in the ROHC Working Group of the IETF.
Compared with the previous schemes (in section
4), the major advantages are high robustness and
improved efficiency.
A key feature is that the ROHC framework is
extensible. This means that new protocols can be
added without the need to design a completely
new compression protocol. The monolithic
approach of VJHC now seems inappropriate,
considering the widespread use of tunnelencapsulations, increasing use of security
protocols and the emergence of IPv6. The penalty
for flexibility is that ROHC is a complicated
technique, absorbing all the existing compression
techniques, and adding a more sophisticated
mechanism to achieve robustness and reliability.
Compression and Decompression are treated as
finite state machines and can be broken into a
series of states. At the compressor, these are
Initialization & Refresh (IR), First Order (FO) and
Second Order (SO). At the decompressor thesethree sates: No Context (NO), Static Context (SC)
and Full Context (FC).
The link starts with no context state for a flow
(CID), and therefore can only perform limited
compression. Once the compressor has
successfully installed context state at the receiver,
it may transits to the next state. (E.g. the
compressor will gradually transit forward from IR
FO SO when it gets sufficiently confident
that decompressor has all the information to
decompress the header).
ROHC defines three operation modes [4], they are:
Unidirectional (U-mode), Bidirectional Optimistic
(O-mode) and Bidirectional Reliable (R-mode).
All ROHC operations start from U-mode, and then
may transit to O-mode or R-mode depending on
the feedback information. U-mode makes a
periodic refresh and time out to update the context
(as in IPHC for UDP), O-mode uses the feedback
channel for error correction and R-mode utilises
all the available resources to prevent from anycontext loss or loss synchronisation.
-
7/28/2019 A Review of IP Packet Compression Techniques
4/6
Each operation mode contains all three states; IR,
FO and SO, as shown in the Figure 2 below. Any
errors that occur will force each state to return
back to a lower compression state or gradually to
lower the operation mode if more errors are
detected or propagated.
Figure 2 ROHC Operation modes
The ROHC basic packet format is shown in Figure
3; it consists of padding, feedback, header
information and payload. These four generic fields
allow the decompressor to generate feedback
information quickly and send this back to
compressor quickly.
Figure 3 ROHC basic packet structure
Padding: Padding is located at the beginning of
the compressed header; this aligns headers to byte
boundaries.
Feedback: Feedback information travels from the
de-compressor to compressor, to assist in
synchronisation of the context state.
Header : Compressed header information
Payload: Packet payload data
As earlier header compression techniques (section
4), ROHC still relies on a Link Layer CRC to
verify integrity of the compressed packet. To gain
the full benefit, ROHC relies on a feedback
acknowledgment channel for error discovery andrecovery. In some cases, a CRC also used protects
the entire segment such as the initialisation and
refresh header and ROHC fragmentation.
The main benefit of ROHC is that it may be
designed to compress any type of header. For
links that may experience loss, it may also be
made more robust than exiting schemes such as
VJHC and IPHC. Moreover it offer significant
benefits for small packets such as TCP ACKs or
VoIP packets when sent over a link with limited
capacity or low transmission speed.
6 EXPERIMENTALRESULTS
6.1 Packet by packet bulk compression
This section investigates the performance of
several bulk compression algorithms (section 3),
when use over the entire IP packet. 10,000
TCP/IPv4 packets were captured from an EthernetLAN and compressed by both the Huffman [3] and
Lempel-Ziv Welch (LZW) [9] algorithms. The
relationship between compression ratio and
original packet length for each compression
algorithm is shown in Figure 7(a), (b) respectively.
Figure 7(a) shows two curves corresponding to
two distinct types of packet payloads. The
compression ratio of the packets in the lower curve
are consistently lower than 1 throughout the entire
range of packet length, (i.e. these packets are
expanded in size by the compression algorithm,rather than being reduced). Packets in this class
include packets that contain random (or pre-
compressed) data, which can not be successfully
compressed by the link compressor using the
dictionary based compression techniques, such as
Huffman coding and LZW.
The upper curve in Figure 7(a) shows an increase
in the compression ratio with the packet length
increase, yielding a compression gain for packets
whose original length is larger than 550 B. In
Huffman coding, variable length packets arerepresented by a fixed size entry in the dictionary,
so the data is are smaller than the corresponding
code size at the beginning of dictionary, whereas
near the end of dictionary, the data size is much
bigger than the code size. When the packet size is
small, most data in a packet are represented by the
first entries in the dictionary, thus, the
compression ratio is less then 1. For larger packet
sizes, good compression ratios are achieved.
Similar result can be found for LZW coding in
Figure 7(b) however LZW maps blocks of variable
length into blocks of fixed size which allow it to
obtain better compression than Huffman coding.
-
7/28/2019 A Review of IP Packet Compression Techniques
5/6
Figure 8 plots the compression ratio against packet
size for 10,000 UDP/IPv4 packets compressed by
either the Huffman or LZW algorithms. The
compression ratio is below 1 through the entire
range of original packet size, which means no
compression gain is obtained. Further analysis
showed that most UDP traffic captured was
steaming data (e.g. audio or video) which is
already compressed by video/audio codec. Some
small (control) packets were compressible.
6.2 Comparison of techniques
Figure 4 presents a theoretical comparison of the
original packet length and compressed packet
length with several types of header. ROHC shows
considerable advantage and is constantly capable
of compression of all headers to less than 10 B.
0
10
20
30
40
50
60
7080
IPv4
IPv6
IPv4/TCP
IPv6/TCP
IPv4/UDP
IPv6/UDP
IPv4/UDP/RTP
IPv6/UDP/RTP
HeaderLengths(bytes)
Original
BULK
VJHC
IPHC
ROHC
Figure 4 Length of original packets and compressed
packet by several compression schemes
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
50
150
250
350
450
550
650
750
850
950
1
050
1
150
1
250
1
350
1
450
Original Packet Length (bytes)
CompressionRatio
BULK
VJHC
IPHC
ROHC
Figure 5 Comparison of compression ratio of several
compression schemes
The compression ratio of several compression
schemes are compared in Figure 5. VJHC, IPHC
and ROHC show a similar relationship between
compression ratio and original packets length.
When the packet length is less then 180 bytes
header compression schemes achieve a higher
compression ratio then bulk compression. In the
case of larger packets, bulk compression can
outperform the other compression schemes, but
only where data has not already been compressed.
The use of IPSEC encryption (ESP with encrypted
payload) is also an obstacle to compression. This
prevents use of VJHC, and forces ROHC to only
compress the outer (unencrypted) packet headers.
Bulk compression will also fail to compress an
encrypted payload. One mitigation, is to use a bulk
compression algorithm prior to applying the
IPSEC encryption.
6.3 Application performance
Header compression increases the efficiency of
information delivery, and for low speed
transmission may also reduce the packet transit
time. The benefit when using small packets (such
as VoIP or TCP ACKs) may be very noticeable.
0 20 40 60 80
FTP
Telnet
HTTP
Multimedia
Bit saving perce ntage (%)
Bit saving (%)
Figure 6 Bit rate saving base on application
To investigate the overall benefit, requires a study
of the range of packet sizes encountered for
different applications. 10,000 packets were
captured from an Ethernet LAN corresponding to
IP flows between several applications, and the
resulting data analysed to determine the
distribution of packet sizes.
Figure 6 shows the predicted bit rate saving of
header compression utilised by application. Since
Telnet sessions packets are usually small, they
offer good compression. HTTP, FTP and
multimedia packet usually consist of large packets,
thus significant efficiency savings cannot be
obtained by solely compressing protocol headers.
7 CONCLUSION
Bulk compression is useful when packets carry
large volumes of uncompressed data (especially
when the payload is larger than 500 bytes).
However, when the IP traffic uses encryption or
higher layer compression (e.g. ZIP files, IPCOMP,or multimedia CODECs) there is no benefit from
-
7/28/2019 A Review of IP Packet Compression Techniques
6/6
compression. In fact, attempted compression
simply wastes sender computing resources
Header compression is an effective way to reduce
the packet header size for small packets (less than
approximately 200 B) where compression can
provide a significant saving in overall packet size.
VJHC can only be applied on TCP/IPv4 packet
with no any options. IPHC can not achieve high
compression ratio by just compressing IP header,
since IP header is one of multiple headers used in
a packet. The existing schemes have a number of
weaknesses in the packet headers they can
compress, and their robustness to loss of packets.
In the future, it can be expected that header
compression techniques based on the ROHC
framework will be able to achieve a reasonablecompression ratio but without necessarily a
significant loss of robustness to packet loss. At
the moment, compression schemes for TCP using
ROHC have not been defined, and their behaviour
with loss is still to be examined. The practical
performance of ROHC is therefore an item of
further work.
REFERENCES
[1] S. Deering and R. Hinden, Internet
Protocol Version 6 (IPv6) Specification,
RFC 1883, 1995.[2] U. o. S. California and C. Marina del Rey,
Internet Protocol Specification, RFC
0791, 1981.
[3] D. A. Huffman, A Method for the
Construction of Minimum Redundancy
Codes, Proceedings of the IRE, 1952.
[4] C. Bormann, et al RObust Header
Compression (ROHC): Framework and
four profiles: RTP, UDP, ESP, and
uncompressed, RFC 3095, 2001.
[5] HP Company, HP Case Study: WAN
Link Compression on HP Routers, 1995.[6] V. Jacobson, Compression TCP/IP for
Low-Speed Serial Link,RFC 1144, 1990.
[7] M. Degermark, B. Nordgren, and S. Pink,
IP Header Compression, RFC 2507,
1999.
[8] M. Degermark, M. Engan, B. Nordgren,
and S. Pink, Low-loss TCP/IP header
compression for wireless networks, at
ACM MobiCom, 1996.
[9] T. A. Welch, A Technique for High-
Performance Data Compression,
Computer, pp. 8-18, 1984.
(a) Huffman Coding
(b) Lempel-Ziv Welch
Figure 7 Packet by packet compression on
TCP/IP packets
(a) Huffman Coding
(b) Lempel-Ziv Welch
Figure 8 Packet by packet compression on
UDP/IP packets