data compression project-huffman algorithm

Data Compression Project

Mini Project Report

Submitted to

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

By

Samir Sheriff

Satvik N

In partial fulfilment of the requirements

for the award of the degree

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

R V College of Engineering

(Autonomous Institute, Affiliated to VTU)

BANGALORE - 560059

May 2012

DECLARATION

We, Samir Sheriff and Satvik N bearing USN number 1RV09CS093 and 1RV09CS095

respectively, hereby declare that the dissertation entitled “Data Compression Project”

completed and written by us, has not been previously formed the basis for the award of

any degree or diploma or certificate of any other University.

Bangalore Samir Sheriff

USN:1RV09CS093

Satvik N

USN:1RV09CS095

R V COLLEGE OF ENGINEERING

(Autonomous Institute Affiliated to VTU)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the dissertation entitled, “Data Compression Project”, which

is being submitted herewith for the award of B.E is the result of the work completed by

Samir Sheriff and Satvik N under my supervision and guidance.

Signature of Guide

(Name of the Guide)

Signature of Head of Department Signature of Principal

(Dr. N K Srinath) (Dr. B.S Sathyanarayana)

Name of Examiner Signature of Examiner

1:

2:

ACKNOWLEDGEMENT

The euphoria and satisfaction of the completion of the project will be incomplete without

thanking the persons responsible for this venture.

We acknowledge RVCE (Autonomous under VTU) for providing an opportunity to

create a mini-project in the 5th semester. We express our gratitude towards Prof. B.S.

Satyanarayana, principal, R.V.C.E for constant encouragement and facilitates extended

in completion of this project. We would like to thank Prof. N.K.Srinath, HOD, CSE

Dept. for providing excellent lab facilites for the completion of the project. We would

personally like to thank our project guides Chaitra B.H. and Suma B. and also the

lab in charge, for providing timely assistance & guidance at the time.

We are indebted to the co-operation given by the lab administrators and lab assistants,

who have played a major role in bringing out the mini-project in the present form.

Bangalore

Samir Sheriff

6th semester, CSE

USN:1RV09CS093

Satvik N

6th semester, CSE

USN:1RV09CS095

i

ABSTRACT

The Project “Data Compression Techniques is aimed at developing programs that

transform a string of characters in some representation (such as ASCII) into a new string

(of bits, for example) which contains the same information but whose length is as small as

possible. Compression is useful because it helps reduce the consumption of resources such

as data space or transmission capacity. The design of data compression schemes involve

trade-off s among various factors, including the degree of compression, the amount of

distortion introduced (e.g., when using lossy data compression), and the computational

resources required to compress and uncompress the data.

Many data processing applications require storage of large volumes of data, and the

number of such applications is constantly increasing as the use of computers extends to

new disciplines. Compressing data to be stored or transmitted reduces storage and/or

communication costs. When the amount of data to be transmitted is reduced, the effect

is that of increasing the capacity of the communication channel. Similarly, compressing a

file to half of its original size is equivalent to doubling the capacity of the storage medium.

It may then become feasible to store the data at a higher, thus faster, level of the storage

hierarchy and reduce the load on the input/output channels of the computer system.

ii

Contents

ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

1 INTRODUCTION 1

1.1 SCOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 REQUIREMENT SPECIFICATION 3

3 Compression 4

3.1 A Naive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Building the Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4.1 An Example: ”go go gophers” . . . . . . . . . . . . . . . . . . . . 6

3.4.2 Example Encoding Table . . . . . . . . . . . . . . . . . . . . . . . 8

3.4.3 Encoded String . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Decompression 9

4.1 Storing the Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 Creating the Huffman Table . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.3 Storing Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 CONCLUSION AND FUTURE WORKS 12

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

iii

APPENDICES15

iv

Chapter 1

INTRODUCTION

The Project “Data Compression Techniques is aimed at developing programs that trans-

form a string of characters in some representation (such as ASCII) into a new string (of

bits, for example) which contains the same information but whose length is as small as

possible. Compression is useful because it helps reduce the consumption of resources such

as data space or transmission capacity. The design of data compression schemes involve

trade-offs among various factors, including the degree of compression, the amount of

distortion introduced (e.g., when using lossy data compression), and the computational

resources required to compress and uncompress the data.

1.1 SCOPE

The data compression techniques find applications in almost all fields. To list a few,

• Audio data compression reduces the transmission bandwidth and storage re-

quirements of audio data. Audio compression algorithms are implemented in soft-

ware as audio codecs. Lossy audio compression algorithms provide higher compres-

sion at the cost of fidelity, are used in numerous audio applications. These algo-

rithms almost all rely on psychoacoustics to eliminate less audible or meaningful

sounds, thereby reducing the space required to store or transmit them. Video

1

Software Requirements Specification Data Compression Techniques

• Video compression uses modern coding techniques to reduce redundancy in video

data. Most video compression algorithms and codecs combine spatial image com-

pression and temporal motion compensation. Video compression is a practical im-

plementation of source coding in information theory. In practice most video codecs

also use audio compression techniques in parallel to compress the separate, but

combined data streams.

• Grammar-Based Codes

They can extremely compress highly-repetitive text, for instance, biological data

collection of same or related species, huge versioned document collection, internet

archives, etc. The basic task of grammar-based codes is constructing a context-

free grammar deriving a single string. Sequitur and Re-Pair are practical grammar

compression algorithms which public codes are available.

Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013 2

Chapter 2

REQUIREMENT SPECIFICATION

Software Requirement Specification (SRS) is an important part of software development

process. We describe about the overall description of the Data Compression Project, the

specific requirements of the Data Compression Project, the software requirements and

hardware requirements and the functionality of the system.

Software Requirements

• Front End: Qt GUI Application.

• Back End: C++

• Operating System: Linux.

Hardware Requirements

• Processor: Intel Pentium 4 or higher version

• RAM: 512MB or more

• Hard disk: 5 GB or less

3

Chapter 3

Compression

We’ll look at how the string ”go go gophers” is encoded in ASCII, how we might save

bits using a simpler coding scheme, and how Huffman coding is used to compress the

data resulting in still more savings.

3.1 A Naive Approach

With an ASCII encoding (8 bits per character) the 13 character string ”go go gophers”

requires 104 bits. The table below on the left shows how the coding works.

4

Compression Data Compression Techniques

The string ”go go gophers” would be written (coded numerically) as 103 111 32 103

111 32 103 111 112 104 101 114 115. Although not easily readable by humans, this would

be written as the following stream of bits (the spaces would not be written, just the 0’s

and 1’s) 1100111 1101111 1100000 1100111 1101111 1000000 1100111 1101111 1110000

1101000 1100101 1110010 1110011 Since there are only eight different characters in ”go go

gophers”, it’s possible to use only 3 bits to encode the different characters. We might, for

example, use the encoding in the table on the right above, though other 3-bit encodings

are possible. Now the string ”go go gophers” would be encoded as 0 1 7 0 1 7 0 1 2 3 4

5 6 or, as bits: 000 001 111 000 001 111 000 001 010 011 100 101 110 111 By using three

bits per character, the string ”go go gophers” uses a total of 39 bits instead of 104 bits.

More bits can be saved if we use fewer than three bits to encode characters like g, o, and

space that occur frequently and more than three bits to encode characters like e, p, h, r,

and s that occur less frequently in ”go go gophers”.

3.2 The Basic Idea

This is the basic idea behind Huffman coding: to use fewer bits for more frequently

occurring characters. We’ll see how this is done using a tree that stores characters at

the leaves, and whose root-to-leaf paths provide the bit sequence used to encode the

characters.

We’ll use Huffman’s algorithm to construct a tree that is used for data compression.

We’ll assume that each character has an associated weight equal to the number of times

the character occurs in a file, for example. In the ”go go gophers” example, the characters

’g’ and ’o’ have weight 3, the space has weight 2, and the other characters have weight 1.

When compressing a file we’ll need to calculate these weights, we’ll ignore this step for

now and assume that all character weights have been calculated.



3.3 Building the Huffman Tree

Huffman’s algorithm assumes that we’re building a single tree from a group (or forest)

of trees. Initially, all the trees have a single node with a character and the character’s

weight. Trees are combined by picking two trees, and making a new tree from the two

trees. This decreases the number of trees by one at each step since two trees are combined

into one tree. The algorithm is as follows:

• Begin with a forest of trees. All trees are one node, with the weight of the tree equal

to the weight of the character in the node. Characters that occur most frequently

have the highest weights. Characters that occur least frequently have the smallest

weights.

• Repeat this step until there is only one tree:

• Choose two trees with the smallest weights, call these trees T1 and T2. Create a

new tree whose root has a weight equal to the sum of the weights T1 + T2 and

whose left subtree is T1 and whose right subtree is T2.

• The single tree left after the previous step is an optimal encoding tree..

3.4 An Example

3.4.1 An Example: ”go go gophers”

We’ll use the string ”go go gophers” as an example. Initially we have the forest shown

below. The nodes are shown with a weight/count that represents the number of times

the node’s character occurs.


Decompression Data Compression Techniques

3.4.2 Example Encoding Table

The character encoding induced by the last tree is shown below where again, 0 is used

for left edges and 1 for right edges.

3.4.3 Encoded String

The string ”go go gophers” would be encoded as shown (with spaces used for easier

reading, the spaces wouldn’t appear in the real encoding). 00 01 100 00 01 100 00 01

1110 1101 101 1111 1100

Once again, 37 bits are used to encode ”go go gophers”. There are several trees that

yield an optimal 37-bit encoding of ”go go gophers”. The tree that actually results from

a programmed implementation of Huffman’s algorithm will be the same each time the

program is run for the same weights (assuming no randomness is used in creating the

tree).


Chapter 4

Decompression

Generally speaking, the process of decompression is simply a matter of translating the

stream of prefix codes to individual byte values, usually by traversing the Huffman tree

node by node as each bit is read from the input stream (reaching a leaf node necessarily

terminates the search for that particular byte value). Before this can take place, however,

the Huffman tree must be somehow reconstructed.

4.1 Storing the Huffman Tree

• In the simplest case, where character frequencies are fairly predictable, the tree can

be preconstructed (and even statistically adjusted on each compression cycle) and

thus reused every time, at the expense of at least some measure of compression

efficiency.

• Otherwise, the information to reconstruct the tree must be sent a priori.

• A naive approach might be to prepend the frequency count of each character to

the compression stream. Unfortunately, the overhead in such a case could amount

to several kilobytes, so this method has little practical use.

• Another method is to simply prepend the Huffman tree, bit by bit, to the output

stream. For example, assuming that the value of 0 represents a parent node and 1 a

leaf node, whenever the latter is encountered the tree building routine simply reads

9

Decompression Data Compression Techniques

the next 8 bits to determine the character value of that particular leaf. The process

continues recursively until the last leaf node is reached; at that point, the Huffman

tree will thus be faithfully reconstructed. The overhead using such a method ranges

from roughly 2 to 320 bytes (assuming an 8-bit alphabet).

Many other techniques are possible as well. In any case, since the compressed data

can include unused ”trailing bits” the decompressor must be able to determine when to

stop producing output. This can be accomplished by either transmitting the length of

the decompressed data along with the compression model or by defining a special code

symbol to signify the end of input (the latter method can adversely affect code length

optimality, however).

4.2 Creating the Huffman Table

To create a table or map of coded bit values for each character you’ll need to traverse

the Huffman tree (e.g., inorder, preorder, etc.) making an entry in the table each time

you reach a leaf. For example, if you reach a leaf that stores the character ’C’, following

a path left-left-right-right-left, then an entry in the ’C’-th location of the map should be

set to 00110. You’ll need to make a decision about how to store the bit patterns in the

map. At least two methods are possible for implementing what could be a class/struct

BitPattern:

• Use a string. This makes it easy to add a character (using +) to a string during

tree traversal and makes it possible to use string as BitPattern. Your program may

be slow because appending characters to a string (in creating the bit pattern) and

accessing characters in a string (in writing 0’s or 1’s when compressing) is slower

than the next approach.


Conclusion Data Compression Techniques

• Alternatively you can store an integer for the bitwise coding of a character. You

need to store the length of the code too to differentiate between 01001 and 00101.

However, using an int restricts root-to-leaf paths to be at most 32 edges long since

an int holds 32 bits. In a pathological file, a Huffman tree could have a root-to-leaf

path of over 100. Because of this problem, you should use strings to store paths

rather than ints. A slow correct program is better than a fast incorrect program.

4.3 Storing Sizes

The operating system will buffer output, i.e., output to disk actually occurs when some

internal buffer is full. In particular, it is not possible to write just one single bit to a file,

all output is actually done in ”chunks”, e.g., it might be done in eight-bit chunks. In any

case, when you write 3 bits, then 2 bits, then 10 bits, all the bits are eventually written,

but you can not be sure precisely when they’re written during the execution of your

program. Also, because of buffering, if all output is done in eight-bit chunks and your

program writes exactly 61 bits explicitly, then 3 extra bits will be written so that the

number of bits written is a multiple of eight. Because of the potential for the existence

of these ”extra” bits when reading one bit at a time, you cannot simply read bits until

there are no more left since your program might then read the extra bits written due to

buffering. This means that when reading a compressed file, you CANNOT use code like

this.

int bits;

while (input.readbits(1, bits))

{

// process bits

}

To avoid this problem, you can write the size of a data structure before writing the

data structure to the file.


Chapter 5

CONCLUSION AND FUTURE

WORKS

Summary

Limitations

1. Huffman code is optimal only if exact probability distribution of the source symbols

is known.

2. Each symbol is encoded with integer number of bits.

3. Huffman coding is not efficient to adapt with the changing source statistics.

4. The length of the codes of the least probable symbol could be very large to store

into a single word or basic storage unit in a computing system.

Further enhancements The huffman coding the we have considered is simple binary

Huffman codingbut many variations of Huffman coding exist,

1. n-ary Huffman coding: The n-ary Huffman algorithm uses the {0, 1, ... , n 1}

alphabet to encode message and build an n-ary tree. This approach was considered

by Huffman in his original paper. The same algorithm applies as for binary (n equals

2)codes, except that the n least probable symbols are taken together, instead of just

the 2 least probable. Note that for n greater than 2, not all sets of source words

12

Conclusion Data Compression Techniques

can properly form an n-ary tree for Huffman coding. In this case, additional 0-

probability place holders must be added. If the number of source words is congruent

to 1 modulo n-1, then the set of source words will form a proper Huffman tree.

2. Adaptive Huffman coding: A variation called adaptive Huffman coding calcu-

lates the probabilities dynamically based on recent actual frequencies in the source

string. This is some what related to the LZ family of algorithms.

3. Huffman template algorithm: Most often, the weights used in implementa-

tions of Huffman coding represent numeric probabilities, but the algorithm given

above does not require this; it requires only a way to order weights and to add

them. The Huffman template algorithm enables one to use any kind of weights

(costs,frequencies etc)

4. Length-limited Huffman coding: Length-limited Huffman coding is a variant

where the goal is still to achieve a minimum weighted path length, but there is an

additional restriction that the length of each codeword must be less than a given

constant. The package-merge algorithm solves this problem with a simple greedy

approach very similar to that used by Huffman’s algorithm. Its time complexity

is O(nL),where L is the maximum length of a codeword. No algorithm is known

to solve this problem in linear or linear logarithmic time, unlike the presorted and

unsorted conventional Huffman problems, respectively.


Bibliography

14

Appendix A: Source Code Data Compression Techniques

Appendices

Appendix A : Source Code

Listing 5.1: The definiton of the class Charnode each node of the huffman tree is an

object of this class.

#ifndef Charnode h

#define Charnode h

#define DEBUG 1

#i f DEBUG

#define LOG( s ) cout<<s<<endl ;

#else

#define LOG( s ) //

#endif

using namespace std ;

template <class TYPE>

class Charnode

{

TYPE ch ;

int count ;

Charnode ∗ l e f t ;

Charnode ∗ r i g h t ;

public :

Charnode (TYPE ch , int count = 0 ) ;

Charnode ( const Charnode ∗ New ) ;

int GetCount ( ) ;



int Value ( ) ;

void SetLe f t ( Charnode ∗ l e f t ) ;

void SetRight ( Charnode ∗ r i g h t ) ;

Charnode ∗ GetLeft ( void ) ;

Charnode ∗ GetRight ( void ) ;

TYPE GetChar ( void ) ;

void show ( ) ;

bool operator <(Charnode &obj2 ) ;

void setChar (TYPE ch ) ;

} ;


Charnode<TYPE> : : Charnode (TYPE ch , int count )

{

LOG( ”new Charnode”<<count<<” reques ted ” ) ;

this−>ch = ch ;

this−>count = count ;

this−> l e f t = this−>r i g h t = NULL;

}


Charnode<TYPE> : : Charnode ( const Charnode ∗ New)

{

LOG( ”new Charnode”<<New−>count<<” reques ted ” ) ;

this−>ch = New−>ch ;

this−>count = New−>count ;



this−> l e f t = New−> l e f t ;

this−>r i g h t = New−>r i g h t ;

}


int Charnode<TYPE> : : GetCount ( )

{

return count ;

}


int Charnode<TYPE> : : Value ( )

{

return count ;

}


void Charnode<TYPE> : : S e tLe f t ( Charnode ∗ l e f t )

{

this−> l e f t = l e f t ;

}


void Charnode<TYPE> : : SetRight ( Charnode ∗ r i g h t )

{

this−>r i g h t = r i g h t ;

}


Charnode<TYPE> ∗ Charnode<TYPE> : : GetLeft ( void )



{

return l e f t ;

}


Charnode<TYPE> ∗ Charnode<TYPE> : : GetRight ( void )

{

return r i g h t ;

}


TYPE Charnode<TYPE> : : GetChar ( void )

{

return ch ;

}


void Charnode<TYPE> : : show ( )

{

cout<<ch<< ’\ t ’<<count<<endl ;

}


bool Charnode<TYPE> : : operator <(Charnode &obj2 )

{

return ( count < obj2 . GetCount ( ) ) ;

}


void Charnode<TYPE> : : setChar (TYPE ch )



{

this−>ch = ch ;

}

#endif

Listing 5.2: The definition of the class Huffman this class helps in building the huffman

tree for an input file.

#include <iostream>

#include ”Charnode . h”

#include ” g l o b a l s . h”

#include ” b i tops . h”

#include <vector>

#include <map>

#include <f stream>

#ifndef HuffmanCode h

#define HuffmanCode h



class Huffman

{

private :

vector<Charnode<TYPE> ∗> charactermap ;

Charnode<TYPE> ∗huffmanTreeRoot ;

map<TYPE, s t r i ng> t ab l e ;



map<TYPE, int> f r eq tab ;

private :

void p r o c e s s f i l e ( const char ∗ f i l ename , map<TYPE, int> & charmap ) ;

vector<Charnode<TYPE> ∗> convertToVector (map<TYPE, int> &chamap ) ;

bool compare ( Charnode<TYPE> ∗ i , Charnode<TYPE> ∗ j ) ;

void MinHeapify ( vector<Charnode<TYPE> ∗> & charactermap , int i , const int n ) ;

void BuildMinHeap ( vector<Charnode<TYPE> ∗> &charactermap ) ;

void buildHuffmanTree ( ) ;

void delNode ( Charnode<TYPE> ∗ ) ;

public :

Huffman ( ) ;

Huffman ( const char ∗ f i l ename ) ;

˜Huffman ( ) ;

void createHuffmanTable ( Charnode<TYPE> ∗ t ree , int code , int he ight ) ;

void displayCharactermap ( ) ;



void displayHuffmanTable ( ) ;

Charnode<TYPE> ∗ getRoot ( ) ;

map<TYPE, s t r i ng> getHuffmanTable ( ) ;

map<TYPE, int> getFrequencyMap ( ) ;

int getCharVecSize ( ) ;

} ;

template<class TYPE>

int Huffman<TYPE> : : getCharVecSize ( )

{

return charactermap . s i z e ( ) ;

}


void Huffman<TYPE> : : p r o c e s s f i l e ( const char ∗ f i l ename , map<TYPE, int> & charmap )

{

ibstream i n f i l e ( f i l ename ) ;

int i n b i t s ;

while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) != fa l se )

{

// cout << (TYPE) i n b i t s ;

charmap [ (TYPE) i n b i t s ]++;

}



LOG( ”\n\n\nEND\n” )

}


vector<Charnode<TYPE> ∗> Huffman<TYPE> : : convertToVector (map<TYPE, int> &chamap)

{

vector<Charnode<TYPE> ∗> charactermap ;

for (typename map<TYPE, int > : : i t e r a t o r i i=chamap . begin ( ) ; i i !=chamap . end ( ) ; ++i i )

{

// cout << (∗ i i ) . f i r s t << ” : ” << (∗ i i ) . second << end l ;

Charnode<TYPE> ∗ ch = new Charnode<TYPE>((∗ i i ) . f i r s t , ( ∗ i i ) . second ) ;

charactermap . push back ( ch ) ;

#i f DEBUG

//ch−>show ( ) ;

i f ( ch−>GetLeft()==NULL && ch−>GetRight()==NULL)

LOG( ” Leaf Node i n i t i a l i z e d proper ly ” ) ;

#endif

}

return charactermap ;

}


bool Huffman<TYPE> : : compare ( Charnode<TYPE> ∗ i , Charnode<TYPE> ∗ j )

{

return (∗ i<∗ j ) ;

}




void Huffman<TYPE> : : MinHeapify ( vector<Charnode<TYPE> ∗> & charactermap , int i , const int n)

{

int l e f t = 2∗ i + 1 ;

int r i g h t = l e f t + 1 ;

int s m a l l e s t = −1;

i f ( l e f t <n && charactermap [ l e f t ]−>Value()< charactermap [ i ]−>Value ( ) )

s m a l l e s t = l e f t ;

else

s m a l l e s t = i ;

i f ( r i ght<n && charactermap [ r i g h t ]−>Value()< charactermap [ s m a l l e s t ]−>Value ( ) )

s m a l l e s t = r i g h t ;

i f ( s m a l l e s t != i )

{

Charnode<TYPE> ∗ temp = charactermap [ i ] ;

charactermap [ i ] =charactermap [ s m a l l e s t ] ;

charactermap [ s m a l l e s t ] = temp ;

MinHeapify ( charactermap , sma l l e s t , n ) ;

}

}


void Huffman<TYPE> : : BuildMinHeap ( vector<Charnode<TYPE> ∗> &charactermap )

{

int n = charactermap . s i z e ( ) ;

for ( int i = n /2 ; i>=0 ; i−−)

MinHeapify ( charactermap , i , n ) ;



}


void Huffman<TYPE> : : buildHuffmanTree ( )

{

LOG( f u n c ) ;

vector<Charnode<TYPE> ∗> charactermap = this−>charactermap ; // Dup l i ca te and change

/∗

HUFFMAN (C)

Refer CLRS ( non−unicode c h a r a c t e r s . )

∗/


LOG( ” S i z e o f the char map = ”<<n ) ;

for ( int i =1; i<n ; i++)

{

LOG( i<<” th i t e r a t i o n ” )

BuildMinHeap ( charactermap ) ;

Charnode<TYPE> ∗ l e f t = new Charnode<TYPE>(charactermap [ 0 ] ) ;

LOG( l e f t−>GetCount ( ) ) ;

charactermap . e r a s e ( charactermap . begin ( )+0) ;

BuildMinHeap ( charactermap ) ;

Charnode<TYPE> ∗ r i g h t = new Charnode<TYPE>(charactermap [ 0 ] ) ;

charactermap . e r a s e ( charactermap . begin ( )+0) ;

LOG( r ight−>GetCount ( ) ) ;

Charnode<TYPE> ∗ z = new Charnode<TYPE>( ’ \0 ’ , l e f t−>Value ()+ r ight−>Value ( ) ) ;

z−>SetLe f t ( l e f t ) ;



z−>SetRight ( r i g h t ) ;

LOG( z−>GetCount ( ) )

LOG( z−>GetLeft()−>GetCount ( ) ) ;

LOG( z−>GetRight()−>GetCount ( ) ) ;

charactermap . push back ( z ) ;

}

huffmanTreeRoot = charactermap [ 0 ] ; // I n i t i a l i z e the roo t ;

}


Huffman<TYPE> : : Huffman ( )

{}


Huffman<TYPE> : : Huffman ( const char ∗ f i l ename )

{

map<TYPE, int> charmap ;

p r o c e s s f i l e ( f i l ename , charmap ) ;

charactermap = convertToVector ( charmap ) ;

f r eq tab = charmap ;

buildHuffmanTree ( ) ;

createHuffmanTable ( huffmanTreeRoot , 0 , 0 ) ;

}




void Huffman<TYPE> : : delNode ( Charnode<TYPE> ∗node )

{

i f ( node == NULL)

return ;

delNode ( node−>GetLeft ( ) ) ;

delNode ( node−>GetRight ( ) ) ;

delete node ;

}


Huffman<TYPE> : :˜ Huffman ( )

{

delNode ( huffmanTreeRoot ) ;

huffmanTreeRoot = NULL;

}


void Huffman<TYPE> : : createHuffmanTable ( Charnode<TYPE> ∗ t ree , int code , int he ight )

{

LOG( f u n c ) ;

i f ( t r e e==NULL) // This c o d i t i o n never occurs !

return ;

i f ( t ree−>GetLeft()==NULL && tree−>GetRight()==NULL) // Leaf Node : Print the char , count and the code . ( code l e n g t h = h e i g h t o f the node :−))

{

// cout<<”Character ”<<t ree−>GetChar()<<’\ t ’<<”Count = ”<<t ree−>GetCount()<<’\ t ’ ;

// cout<<”Code : ” ;

s t r i n g codeStr ing = ”” ;



for ( int j = height −1; j >=0; j−−)

{

i f ( code & (1<< j ) )

{

// cout << ’1 ’;

codeSt r ing += ”1” ;

}

else

{

// cout << ’0 ’;

codeSt r ing += ”0” ;

}

}

// cout<<end l ;

t ab l e [ t ree−>GetChar ( ) ] = codeSt r ing ;

return ;

}

code = code<<1;

createHuffmanTable ( t ree−>GetLeft ( ) , code , he ight +1);

createHuffmanTable ( t ree−>GetRight ( ) , code |1 , he ight +1);

}


void Huffman<TYPE> : : displayCharactermap ( )

{

LOG( f u n c ) ;




LOG( ” S i z e = ”<<n)

for ( int i = 0 ; i<n ; i++)

charactermap [ i ]−>show ( ) ;

cout<<endl ;

}


Charnode<TYPE> ∗ Huffman<TYPE> : : getRoot ( )

{

return huffmanTreeRoot ;

}


void Huffman<TYPE> : : displayHuffmanTable ( )

{

LOG( ”HUFFMAN TABLE” ) ;

for (typename map<TYPE, s t r i ng > : : i t e r a t o r i i=tab l e . begin ( ) ; i i != tab l e . end ( ) ; ++i i )

{

cout << endl << (∗ i i ) . f i r s t << ”\ t ” << (∗ i i ) . second ;

}

cout << endl ;

}


map<TYPE, s t r i ng> Huffman<TYPE> : : getHuffmanTable ( )

{

return t ab l e ;

}




map<TYPE, int> Huffman<TYPE> : : getFrequencyMap ( )

{

return f r eq tab ;

}

#endif

Listing 5.3: The definition of the class CompressionWriting this class helps in writing

the bits to the compressed file.

#ifndef COMP H

#define COMP H

#include <iostream>

#include <vector>

#include <map>

#include <s t r i ng>

#include <f stream>






class CompressionWriting

{



map<TYPE, s t r i ng> huffmanTable ;


s t r i n g outputFilename ;

s t r i n g inputFi lename ;

map<TYPE, int> freqMap ;

private :

int convertStr ingToBitPattern ( s t r i n g s t r ) ;

int totalNumOfBits ( void ) ;

public :

CompressionWriting (){}

CompressionWriting ( Charnode<TYPE> ∗ root , map<TYPE, s t r i ng> tab le , map<TYPE, int> freqMap , s t r i n g oname , s t r i n g iname ) ;

void writeCompressedDataToFile ( ) ;

void di sp layOutputFi l e ( ) ;

void writeHuffmanTreeBitPattern ( Charnode<TYPE> ∗ t ree , obstream &o u t f i l e ) ;

} ;


CompressionWriting<TYPE> : : CompressionWriting ( Charnode<TYPE> ∗ root , map<TYPE, s t r i ng> tab le , map<TYPE, int> freMap , s t r i n g oname , s t r i n g iname )

{

huffmanTreeRoot = root ;

huffmanTable = tab l e ;



outputFilename = oname ;

inputFi lename = iname ;

freqMap = freMap ;

}


void CompressionWriting<TYPE> : : writeCompressedDataToFile ( )

{

LOG( ”\nWriting Pattern : \n” ) ;

ibstream i n f i l e ( inputFi lename . c s t r ( ) ) ;

obstream o u t f i l e ( outputFilename . c s t r ( ) ) ;

o u t f i l e . w r i t e b i t s (BITS PER INT , freqMap . s i z e ( ) ) ; // Writing Number o f unique Characters

writeHuffmanTreeBitPattern ( huffmanTreeRoot , o u t f i l e ) ; // Writing Huffman Tree

o u t f i l e . w r i t e b i t s (BITS PER INT , totalNumOfBits ( ) ) ; // Writing Tota l Number o f B i t s to be compressed

// Writing Compressed Data

int i n b i t s ;

i n f i l e . rewind ( ) ;

while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) )

{

// cout << (TYPE) i n b i t s << ” = ” << huffmanTable [ (TYPE) i n b i t s ] ;

int b i tPat te rn = convertStr ingToBitPattern ( huffmanTable [ (TYPE) i n b i t s ] ) ;

// cout << ” = ” << b i t P a t t e r n << end l ;



o u t f i l e . w r i t e b i t s ( huffmanTable [ (TYPE) i n b i t s ] . l ength ( ) , b i tPat t e rn ) ;

}

o u t f i l e . f l u s h b i t s ( ) ;

i n f i l e . c l o s e ( ) ;

o u t f i l e . c l o s e ( ) ;

}


int CompressionWriting<TYPE> : : totalNumOfBits ( )

{

int count = 0 ;

int n = freqMap . s i z e ( ) ;

for (map<char , int > : : i t e r a t o r i i = freqMap . begin ( ) ; i i != freqMap . end ( ) ; ++i i )

{

// Length o f each c h a r a c t e r code ∗ num of t imes the char appears o f a l l t he chars = num of b i t s in the c h a r a c t e r

count += huffmanTable [ ( ∗ i i ) . f i r s t ] . l ength ( ) ∗ (∗ i i ) . second ;

}

LOG( ”Count = ” << count << endl ) ;

return count ;

}


int CompressionWriting<TYPE> : : convertStr ingToBitPattern ( s t r i n g s t r )



{

int b i tPat te rn = 0 ;

int n = s t r . l ength ( ) ;

for ( int i =0; i<n ; i++)

b i tPat te rn += (1 << (n−i −1)) ∗ ( s t r [ i ] − ’ 0 ’ ) ;

return b i tPat te rn ;

}


void CompressionWriting<TYPE> : : d i sp layOutputFi l e ( )

{

ibstream i n f i l e ( outputFilename . c s t r ( ) ) ;

o f s tream o u t f i l e ( ”xxx” ) ;

cout << ”\nDisp lay ing Output F i l e : ” << endl ;

int i n b i t s ;

while ( i n f i l e . r e a d b i t s (1 , i n b i t s ) != fa l se )

{

cout << i n b i t s ;

o u t f i l e << i n b i t s ;

}

o u t f i l e . c l o s e ( ) ;

}


void CompressionWriting<TYPE> : : writeHuffmanTreeBitPattern ( Charnode<TYPE> ∗node , obstream &o u t f i l e )

{

i f ( node == NULL)



return ;

i f ( node−>GetLeft ( ) == NULL && node−>GetRight ( ) == NULL)

{

o u t f i l e . w r i t e b i t s (1 , 1 ) ;

o u t f i l e . w r i t e b i t s (BITS PER WORD, node−>GetChar ( ) ) ;

}

else

{

o u t f i l e . w r i t e b i t s (1 , 0 ) ;

writeHuffmanTreeBitPattern ( node−>GetLeft ( ) , o u t f i l e ) ;

writeHuffmanTreeBitPattern ( node−>GetRight ( ) , o u t f i l e ) ;

}

}

#endif

Listing 5.4: The main program of the huffman compression algorithm.

#include<f stream>

#include<c s td io>

#include<algor ithm>

#include<iostream>

#include<c s t r i ng>

#include<map>

#include<vector>

#include<c s t d l i b >




#include ”HuffmanCode . h”



#include ” CompressionWriting . h”


int main ( int argc , char ∗ argv [ ] )

{

LOG( f u n c ) ;

i f ( argc != 3)

{

cout<<”Usage ”<<argv [0]<< ” I n p u t f i l e ”<<” O u t p u t f i l e \n” ;

e x i t ( 0 ) ;

}

Huffman<char> hu f f ( argv [ 1 ] ) ;

// h u f f . d isp layCharactermap ( ) ;

cout << endl << endl ;

// h u f f . d isp layHuffmanTable ( ) ;

map<char , s t r i ng> huffmanTable = hu f f . getHuffmanTable ( ) ;

CompressionWriting<char> writ ingObj ( hu f f . getRoot ( ) , huffmanTable , hu f f . getFrequencyMap ( ) , argv [ 2 ] , argv [ 1 ] ) ;

wr it ingObj . writeCompressedDataToFile ( ) ;

cout<<”Done ! ”<<endl ;

// w r i t i n g O b j . d i s p l a y O u t p u t F i l e ( ) ;

// t e s t ( ) ;



// cin . g e t ( ) ;

}

Listing 5.5: The definition of the class Decompressor this class helps in decompressing

the compressed file using huffman algorithm.

#ifndef DECOMP H

#define DECOMP H

#include <iostream>

#include <vector>

#include <map>

#include <s t r i ng>






class Decompressor

{


s t r i n g outputFilename ;

s t r i n g compressedFilename ;

int numChars ;

private :

inl ine int readCount ( ibstream &i b s ) ;



void const ructTree ( Charnode<TYPE> ∗ &, int n , ibstream &i b s ) ;

void preorder ( Charnode<TYPE> ∗node ) ;

public :

Decompressor ( ){}

Decompressor ( s t r i n g cname , s t r i n g oname ) ;

˜Decompressor ( ) ;

void decompress ( ) ;

void delNode ( Charnode<TYPE> ∗ ) ;

} ;


Decompressor<TYPE> : : Decompressor ( s t r i n g cname , s t r i n g oname)

{

outputFilename = oname ;

compressedFilename = cname ;

}


void Decompressor<TYPE> : : delNode ( Charnode<TYPE> ∗ node )

{

i f ( node == NULL)

return ;

i f ( node−>GetLeft ( ) != NULL)



delNode ( node−>GetLeft ( ) ) ;

i f ( node−>GetRight ( ) != NULL)

delNode ( node−>GetRight ( ) ) ;

delete node ;

}


Decompressor<TYPE> : :˜ Decompressor ( )

{

LOG( f u n c )

// delNode ( huffmanTreeRoot ) ;

huffmanTreeRoot = NULL;

}

template < class TYPE >

int Decompressor<TYPE> : : readCount ( ibstream & i b s )

{

int count = 0 ;

i b s . r e a d b i t s (BITS PER INT , count ) ;

return count ;

}


void Decompressor<TYPE> : : p reorder ( Charnode<TYPE> ∗node )

{

i f ( node == NULL)

{



return ;

}

cout << endl << node−>GetChar ( ) ;

preorder ( node−>GetLeft ( ) ) ;

p reorder ( node−>GetRight ( ) ) ;

}


void Decompressor<TYPE> : : const ructTree ( Charnode<TYPE> ∗ &node , int n , ibstream &i b s )

{

i f (n == 0)

return ;

i f ( node != NULL && node−>GetLeft ( ) != NULL && node−>GetRight ( ) != NULL)

return ;

int b i t r ead ;

i b s . r e a d b i t s (1 , b i t r ead ) ;

i f ( b i t r ead == 1)

{

i b s . r e a d b i t s (BITS PER WORD, b i t r ead ) ;

node = new Charnode<TYPE>((char ) b i t r ead ) ;

n−−;

}

else

{

node = new Charnode<TYPE>( ’ \0 ’ ) ;



Charnode<TYPE> ∗ l e f t n o d e = node−>GetLeft ( ) ;

Charnode<TYPE> ∗ r ightnode = node−>GetRight ( ) ;

const ructTree ( l e f tnode , n , i b s ) ;

const ructTree ( r ightnode , n , i b s ) ;

node−>SetLe f t ( l e f t n o d e ) ;

node−>SetRight ( r ightnode ) ;

}

}


void Decompressor<TYPE> : : decompress ( )

{

// Read and b u i l d the t r e e

/∗ 1) Read the f i r s t 8 b i t s t h a t r e p r e s e n t s the count o f b i t s in the t r e e

∗ 2) ( Reading the t r e e c o n t e n t s ) 0 i n d i c a t e s an i n t e r n a l node and when a 1

∗ i s encountered i t ’ s a l e a f and the next 8 b i t s r e p r e s e n t t h a t char .

∗ 3) Thus read a l l the chars i n t o a l i s t and c o n s t r u c t the huffman t r e e ( Bottom Up)

∗ 4) Use the t r e e and decompress the f i l e .

∗/

// Step 1

vector<Charnode<TYPE> ∗> a l l c h a r s ;

ibstream compressedFi l e ( compressedFilename . c s t r ( ) ) ;

obstream outputF i l e ( outputFilename . c s t r ( ) ) ;



int n = readCount ( compressedFi l e ) ;

LOG( ”Huffman Tree S i z e read = ”<< n)

// Step 2

huffmanTreeRoot = NULL; //new Charnode<TYPE>( ’\0 ’) ;

const ructTree ( huffmanTreeRoot , n , compressedFi l e ) ;

// preorder ( huffmanTreeRoot ) ;

// Step 4

int i = readCount ( compressedFi l e ) ;

Charnode<TYPE> ∗ t r a v e r s e r = huffmanTreeRoot ;

while ( i )

{

int b i t r ead ;

compressedFi l e . r e a d b i t s (1 , b i t r ead ) ;

// cout << ”Read b i t = ” << b i t r e a d ;

t r a v e r s e r = ( b i t r ead ) ? t r a ve r s e r−>GetRight ( ) : t r a ve r s e r−>GetLeft ( ) ;

// cout << ”−−> ” << t r a v e r s e r−>GetChar ( ) << end l ;

i f ( t r av e r s e r−>GetLeft()==NULL && tr av e r s e r−>GetRight()==NULL)

{

outputF i l e . w r i t e b i t s (BITS PER WORD, t r a ve r s e r−>GetChar ( ) ) ;

// cout << ” Leaf = ” << t r a v e r s e r−>GetChar ( ) << end l ;

t r a v e r s e r = huffmanTreeRoot ;

}

−−i ;



}

outputF i l e . c l o s e ( ) ;

compressedFi l e . c l o s e ( ) ;

}

#endif

Listing 5.6: The main program of the huffman decompression algorithm.

#include<f stream>

#include<c s td io>

#include<algor ithm>

#include<iostream>

#include<c s t r i ng>

#include<map>

#include<vector>

#include<c s t d l i b >


//#i n c l u d e ”HuffmanCode . h”



#include ”Decompressor . h”


int main ( int argc , char ∗ argv [ ] )

{

LOG( f u n c ) ;

i f ( argc != 3)

{



cout<<”Usage ”<<argv [0]<< ” Input f i l e ”<<” Output f i l e \n” ;

e x i t ( 0 ) ;

}

Decompressor<char> c o m p r e s s e d f i l e ( argv [ 1 ] , argv [ 2 ] ) ;

c o m p r e s s e d f i l e . decompress ( ) ;

// cin . g e t ( ) ;

// c in . g e t ( ) ;

}


Appendix B: Screen shots Data Compression Techniques

Appendix B : Screen Shots

Figure 5.1: The Data Compression Server window.

Figure 5.2: Creation of a new file from the server window.



Figure 5.3: Compressing a file (google) at the server.

Figure 5.4: Compressing a file (samir.txt) at the server.



Figure 5.5: The Data Compression Client window.

Figure 5.6: The Client after receiving a file from the server .



Figure 5.7: The Client after receiving a file from the server.


data compression project-huffman algorithm

Documents