[ieee 2010 international conference on recent trends in information, telecommunication and computing...

A Lossless MOD-ENCODER Towards a SecureCommunication

G. Praveen Kumar∗, Biswas Parajuli†, Arjun Kumar Murmu‡, Prasenjit Choudhury§ and Jaydeep Howlader¶National Institute of Technology, Durgapur, India

∗[email protected]’ †[email protected]’ ‡[email protected],§[email protected], ¶howlader [email protected]

Abstract—The Internet traffic is increasing day by day. Also,secure data communication is the need of the hour. But the stan-dard compression techniques which are in use, are independentand do not consider the security issues. Hence, we present ageneral encoding technique for secure data communication overa language L with a finite alphabet set

∑. The encoded message

is a bi-tuple of which, the first is a vector of quotients denotedas Q and the second is a representation of remainders denotedas R with respect to a modulus M . The secrecy of the message isretained by communicating R over a secure channel using somestandard encryption mechanism. The computation overhead isalso reduced as the encryption is done only on one half of theencoded message. Further, this encoding mechanism provides alossless compression.

Keywords: Data Compression, Number Representation, En-cryption, Modulo Arithmetic.

I. INTRODUCTION

The rapid growth of Internet in the recent days and thewide spread availability of networks have lead to the devel-opment of powerful and creative applications. Almost all thesoftware applications are becoming online, not to mention theGoogle Docs and Microsoft Office Live. So, the networkshave become more open and accessible. The volume of datatransmitted over the Internet is also increasing. Presently, wehave ebooks, multimedia, e-business, opensource applications,etc. online. Therefore, the information on the Internet arebecoming more sensitive and vulnerable. Many applicationsdemand confidential data communication between the senderand the receiver. Moreover, such overwhelming internet trafficdemands the efficient use of bandwidth available. So, werequire secure communication with low bandwidth usage. Inthis regard, the role played by Data Compression becomessignificant as it offers an attractive approach for reducing thecommunication costs by using the available bandwidth effec-tively. Furthermore, textual data, where every single charactermatters, cannot be afforded to be compressed with LossyCompression techniques. Hence, ’Lossless Data CompressionAlgorithms’ [1] are applied to ensure the reconstruction of theoriginal data from the compressed data.

The conventional encryption (symmetric or asymmetric)does not provide data compression, i.e. the ciphers are of sameor bigger size as the plaintexts. The symmetric encryptionslike TripleDES [2], AES [3], RC5 [4] are relatively lesscomputational intensive than assymetric encryptions like RSA[5], ElGammal [6], ECC [7]. In general, the plaintexts are

first compressed and then encrypted. So, the overhead ofencryption is reduced to the order of the compressed data.Hence, encryption followed by data compression can reducethe computational overheads of encryption.

Many data compression techniques [1], [8] are used toreduce the size of the data to be communicated. Some of therecently developed ones include [9], [10] and depending uponthe application, the compression may be lossless or lossy.

On the other hand, science of Cryptology, that dates back toCaesars time, provides various heuristics for secured commu-nication. Cryptography encompasses techniques that protectthe secrecy of messages transmitted over public communica-tion lines. Encryption is done at the sender side which usessome secret information (a key) and produces a cipher in sucha way that an eavesdropper cannot interpret. The legitimatereceiver does the Decryption with the corresponding key toget back the message.

Although, as mentioned above, a wide variety of techniqueshave been employed for encryption, research, to the bestof our knowledge, which focuses on Data Compression toreduce encrytpion overheads is less prevalent. Motivated bythis, in this paper, we present a secure data communicationmechanism which uses an encoding technique for better band-width utilization as well as encryption overhead reduction. Theencoding mechanism provides a lossless compression, so that,the original data can be reconstructed at the receiver side.The original message M = m0m1 . . . mn is a string overa language L with finite alphabet set

∑. Let mk ∈ M , a

letter of the string,corresponds to the ith letter of∑

. Themessage M is encoded as a bi-tuple < R,Q >. R represents avector of remainders with respect to a modulus Δ, Δ < |∑ |.That is, the kth element of R is i%Δ. Q represents thecorresponding quotients in a Base B number, where B is themeasure of base to represent the individual quotients. That is,B equals to � |∑ |

Δ �+ 1. All the quotients in Q are within therange [0, B − 1]. The encoded message M can be securelycommunicated when the bi-tuple < R,Q > is transmitted.While R may be transmitted through open channel, Q shouldbe transmitted through a secure channel.

The rest of the paper is organised as follows. In SectionII we present the MOD-ENCODER algorithm with detailedexamples. Compression Mechanism is discussed in Section IIIfollowed by Examples and Discussions in Section IV. Finally,Conclusions of our work are mentioned in Section V.

2010 International Conference on Recent Trends in Information, Telecommunication and Computing

978-0-7695-3975-1/10 $25.00 © 2010 IEEE

DOI 10.1109/ITC.2010.89

330

II. PROPOSED ALGORITHM

The proposed MOD-ENCODER algorithm uses any stan-dard encryption technique which incorporates lossless com-pression in order to cater to the needs of low bandwidth anddata security. As mentioned above, let L be a defined languageover the alphabet set

∑. For example, L be English and

∑be

{A,B, . . . Z, a, b, . . . , z}. Let the letters of∑

be indexed bya bijection function I that maps a letter to an integer i, where1 ≤ i ≤ |∑ |. Δ is a constant, called modulus constant.

Let the data string M be {m1,m2,m3, . . . ,mn} ∈ ∑. Per-

forming modulus operation on every I(mi) by Δ, sequentiallyyields remainder set R as {r1, r2, r3, . . . , rn} and quotient setQ as {q1, q2, q3, . . . , qn}. The elements in R havie the valuesbetween [0,Δ − 1]. So, we consider the elements in R to bea vector of numbers in base R. Each ri takes �log2 R� bitsfor binary representation. If the message is of n characters,the number of bits required to represent the vector R isn × �log2 R�.

The quotient set is represented in a different way. LetB = � |∑ |

Δ �+1 be another parameter called Base-valuer. Theelements of Q will have values within [0, B−1]. Consider Q asa number in base B, i.e. (q1, q2, . . . , qn)B . Convert the numberto a higher base. It is obvious that a higher base representationwould reduce the digits in the number. If B is less then 10,the we convert QB to a Q10 number.The MOD-ENCODER Encoding Algorithm can be stated as :

1) Input: M ∈ ∑∗

2) n = |M | , i.e. length of M3) Z = n × bit size

i.e. bit size is the number of bits require to representeach charecter.

4) for i = 1 to n

4.1) Read mi the ith character from M .4.2) Find R

R[i] = I(mi)%Δ4.3) Find Q

Q[i] = I(mi)/Δ5) Representation of R

for i = 1 to n

5.1) Represent R[i] in Base Δ.6) Representation of Q

Interpret Q as Base B number and convert it to Base 10The vector R is communicated through open channel, whereasQ is encrypted to a cipher Qc using any standard cryptographictechnique and communicated to the receiver to ensure theconfidentiality of the message M . By doing so, the overheadof encryption is reduced as we encrypt only tuple Q, ratherthan the whole message M . The receiver on receiving R andQc, decrypts Qc to Q and decodes the message from the bi-tuple < R,Q >.The MOD-ENCODER Decoding Algorithm as follows:

1) Input: Bi-tuple < R,Q >2) Convert Q from Base 10 to Base B

Let QB = (q1, q2, . . . , qn) be the representation inBase B

3) Interpret R as a vector of Base Δ number4) for 1 ≤ i ≤ n

4.1) i = qi × Δ + ri

where qi the ith digit of QB ri the ith element ofR.

4.2) mi = I−1(i)5) M = (m1,m2, . . . ,mn)

III. COMPRESSION MECHANISM

As mentioned above, the encrypted message M is a bi-tuple < Q,R > of quotient and remainder. So, the size of theencrypted message can be obtained by calculating the numberof bits required to represent Q and R. Let X be the totalnumber of bits required to represent R and is given by X = n∗�log2 Δ�, where n is the length of the message and �log2 Δ�is the number of bits required to represent each remainder. Thequotient Q is looked upon as a Base B number. Each qi needs�log2 B� bits for its representation. As we know, a numberin Base 10 requires lesser number of bits than its equivalentin another Base B for representation, considering B < 10.Therefore, to lessen the number of bits required to representQ, we first convert it into a Base 10 number, say T . So, thenumber of bits required to represent Q is given as Y = log2 T .Hence, the total number of bits needed for represenation of theencrypted message M can be given by X +Y . Considering, a7-bit representation for every character as in ASCII (or 8-bitin Unicode), Z = n∗7 is the total number of bits required forplain text message. We can observe that Z >(X+Y). So thisreduction in bits provides us with the desired compression andthe Compression Ratio C.R. is given by C.R. = X+Y

Z

IV. EXAMPLES AND DISCUSSION

Let us consider∑

to be the alphabet Engilish with 26characters. Let as take a sample message string M as ’I lovemy country’. The following illustrates the functioning of theabove proposed algorithm. Length of M , i.e. n = 14. Numberof bits required for original message = 98 bitsCase 1: Take Δ = 4. TABLE I shows the correspondingquotients and remainders. X=Total number of bits needed for

TABLE IQUOTIENT AND REMAINDER FOR DIFFERENT VALUES OF Δ

Δ = 4 Δ = 5Msg Val Q R Q R

I 8 2 0 1 3L 11 2 3 2 1O 14 3 2 2 4V 21 5 1 4 1E 4 1 0 0 4M 12 3 0 2 2Y 24 6 0 4 4C 2 0 2 0 2O 14 3 2 2 4U 20 5 0 4 0N 13 3 1 2 3T 19 4 3 3 4R 17 4 1 3 2Y 24 6 0 4 4

331

R = 14 ∗ �log2 4� = 28. Now, we convert the Base 7 Quoientto a Base 10 number T. Base B = � 26

4 � = 7T = 491220602Y = Total number of bits required for Q = 29Total number of bits required to represent the encryptedmessage = X + Y = 57Compression Ratio CR = 1.719Case 2: Take Δ = 5. The value of Δ is now varied andTABLE I gives the corresponding quotients and remainders.X = Total number of bits needed for R = 14 ∗ �log2 5� = 42.Now, we convert the Base 6 Quoient to aBase 10 number T.Base B = � 26

5 � = 6T = 5789913061Y = Total number of bits required for Q = 33Total number of bits required to represent the encryptedmessage = X + Y = 75Compression Ratio CR = 1.307

The experimental results as shown in Fig. 1 shows that,for any fixed Δ, if the input message size increases thecompression ratio remains almost constant. This implies thatthe compression ratio is independent to the size of the mes-sage. Fig. 2 shows the comparative results of the degree ofcompression with different Base-values Δ. Here the Q tuple isconverted to the Base 10 number for all the cases. That is, forany quotient Q = (q1, q2, . . . , qn)B the number is convertedto a Base 10 number as Q10 = (q̄1, q̄2, . . . , q̄n). It can berealized that, as B is much smaller than 10, the degree ofcompression is better. Whereas, when B is close to 10 thedegree of compression reduces. For values of B greater than10 there is no compression as the size of encoded messagebecomes bigger than M .

V. SECURITY ANALYSIS

The transmission of tuple Q to the receiver via a securechannel ensures the confidentiality of the message. However,tuple R can be communicated through open channel. So,R becomes easily accessible. Let some intruder capture Ras (r1, r2, . . . , rn) from the open channel. In that case, theprobability that an element ri is interpreted correctly by theintruder is |∑ |

Δ . For example, if∑

is the set of 26 Englishcharacters and Δ is 5, any remainder ri = 3 has 5 possibleoutcomes as {d, i, n, s, x}. The message can be correctlydecoded if the corresponding quotient qi is known. As Qis communicated securely, the intruder can not decode themessage. In general, if the message is of n characters, theunconditional probability for correctly decoding a message is|∑ |Δ

n.

VI. CONCLUSION

The proposed MOD-ENCODER technique represents themessage as a bi-tuple < R,Q >. The individual tuples canbe communicated independently to the receiver. The secrecyof the message is ensured by encoding one half, generally Q,of the bi-tuple. The error probability of decoding is consid-erably high when either Q or R is unknown. The proposedtechnique also provides a lossless compression that facilitates

0

0.5

1

1.5

2

2.5

3

0 1000 2000 3000 4000 5000

Com

pres

sion

Rat

io (

CR

)

Size of Plain Text (n)

DELTA = 4DELTA = 5DELTA = 7

Fig. 1. C.R. vs n

0

0.5

1

1.5

2

2.5

3

0 5 10 15 20

Com

pres

sion

Rat

io (

CR

)

Modulus Constant (Delta)

Fig. 2. C.R. vs Δ

better bandwidth utilization. Moreover, as the encryption isapplied to one half of the encoded message, it reduces thecomputational complexity.

REFERENCES

[1] Debra A. Lelewer, Daniel S. Hirschberg ”Data Compression”, ACMComputing Surveys (CSUR), vol 19, Issue 3, pp. 261 - 296, Sep. 1987

[2] William C. Barker, ”Recommendation for the Triple Data EncryptionAlgorithm (TDEA) Block Cipher”, National Institute of Standards andTechnology, NIST Special Publication 800-67, 2008.

[3] ”Announcing the ADVANCED ENCRYPTION STANDARD(AES)”,Federal Information Processing Standards Publication 197,Nov. 2001

[4] R.L. Rivest, ”The RC5 encryption algorithm”, Proceedings of the 1994Leuven Workshop on Fast Software Encryption, pp. 86-96, Springer-Verlag, 1995.

[5] Ron Rivest, Adi Shamir and Len Adleman, ”A method for obtainingDigital Signatures and Public Key Cryptosystems”, Communications ofthe ACM, pp 120-126, Feb. 1978.

[6] Taher ElGamal, ”A Public-Key Cryptosystem and a Signature SchemeBased on Discrete Logarithms”, IEEE Transactions on InformationTheory, v. IT-31, n. 4, 1985, pp.469472 or CRYPTO 84, pp.1018,Springer-Verlag.

[7] Elliptic Curve Cryptography, Certicom Research, 2000[8] Huffman’s original article: D.A. Huffman, ”A Method for the Construc-

tion of Minimum-Redundancy Codes”, Proceedings of the I.R.E., Sep.1952, pp.10981102

[9] Amit Jain, Ravindra Patel, ”An Efficient Compression Algorithm (ECA)for Text Data”, icsps, pp.762-765, 2009 International Conference onSignal Processing Systems, 2009

[10] Farina, A.; Navarro, G.; Parama, J.R., ”Word-Based Statistical Compres-sors as Natural Language Compression Boosters”, Data CompressionConference 2008, pp. 162 - 171, Mar. 2008

332

[ieee 2010 international conference on recent trends in information, telecommunication and computing...

Documents