data representation

51
Data Representation, Data Compression & Encryption Group Member: TAY LEONG PING B031110105 NG SING HAN B031110101 NGOH KYE LIAN B031110024 WONG LAM SHEN B031110044 TAN CHING TING B031110241

Upload: chingting

Post on 06-May-2015

357 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Data representation

Data Representation, Data Compression & Encryption

Group Member:

TAY LEONG PINGB031110105

NG SING HANB031110101

NGOH KYE LIANB031110024

WONG LAM SHENB031110044

TAN CHING TINGB031110241

Page 2: Data representation

Data Representation

Page 3: Data representation

Data Representation

Data representation is generally how information is conceived, manipulated, and recorded. The term can also be defined as the form in which data and information is kept in a certain environment. How data is stored varies from one environment to another, with each environment having its own set of rules and standards.

Page 4: Data representation

Data Representation

Data Representation refers to the methods used internally to represent information stored in a computer. Computers store lots of different types of information:

numbers

text

graphics of many varieties (stills, video, animation)

sound

Page 5: Data representation

Data Representation

" The problem is that a file containing the bytes 108, 97, 110 would read as “lan” on an ASCII system, but

“%/>” on an EBCDIC system

In ASCII, the value 108 means the character 'l'

" In EBCDIC, the value 108 means the character '%'

Page 6: Data representation

ASCII - American Standard Code for Information Interchange, representing English on all microcomputers and most minicomputer.

EBCDIC - Extended Binary Coded Decimal Interchange Code, represents English on IBM mainframes.

Shift-JIS - Japanese Characters.

Page 7: Data representation

Data Representation

Data representations include:

ASN.1 (Abstract Syntax Notation One) - an ISO standard

XDR (External Data Representation)

- used with SunRPC

Page 8: Data representation

ASN.1

Abstract Syntax Notation (ASN.1) is standard and notation that describes rules and structures for representing, encoding, transmitting, and decoding data. It consists of two parts:

1. abstract syntax that describes data structures in an unambiguous way. Use “ integers”, “character strings”, and “structures” rather than bits and bytes.

2. A transfer syntax that describes the bit stream encoding of ASN.1 data objects.

Page 9: Data representation

ASN.1

The standard ASN.1 encoding rules include:

- Basic Encoding Rules (BER)

- Canonical Encoding Rules  (CER)

- Distinguished Encoding Rules(DER)

- XML Encoding Rules  (XER)

- Packed Encoding Rules  (PER)

Page 10: Data representation

ASN.1

Example of ASN.1’S abstract syntax:

Student ::= SEQUENCE {

name [0] IMPLICIT OCTET STRING OPTIONAL,

grad [1] IMPLICIT BOOLEAN OPTIONAL DEFAULT FALSE,

gpa [2] IMPLICIT REAL OPTIONAL,

id [3] IMPLICIT INTEGER,

bday [4] IMPLICIT OCTET STRING OPTIONAL

}

Page 11: Data representation

Current Uses of ANS1

Audio & Video over the InternetAT&T, Intel, IBM, Microsoft, 3COM

Electronic CommerceAmerican Express, GTE, MasterCard, VISA

TelephonyAT&T, MCI, Motorola, Nokia, Sprint

AviationFAA, ICAO

ManufacturingFord, Mercedes Benz, Mitsubishi

Network ManagementBull, Compaq, Hewlett-Packard, Sun

RoutersBay Networks, Cisco, Racal, Xyplex

Page 12: Data representation

External Data Representation(XDR)

External Data Representation (XDR) is much simpler than ASN.1, but less powerful. For instance:

1. XDR uses implicit typing. Communicating peers must know the type of any exchanged data. In contrast, ASN.1 uses explicit typing; it includes type information as part of the transfer syntax.

2. In XDR, all data is transferred in units of 4 bytes. Numbers are transferred in network order, most significant byte first.

Page 13: Data representation

XDR

4 bytes of XDR message:

Page 14: Data representation

XDR

3. Strings consist of a 4 byte length, followed by the data (and perhaps padding in the last byte).

4. Defined types include: integer, enumeration, boolean, floating point, fixed length array, structures, plus others.

One advantage that XDR has over ASN.1 is that current implementations of ASN.1 execute significantly slower than XDR.

Page 15: Data representation

Multipurpose Internet Mail Extensions (MIME)

" The message “£100 is about !150” could

become

Content-Transfer-Encoding: quoted-printable

Content-Type: text/plain; charset=ISO-8859-15

MIME-Version: 1.0

=A3100 is about =A4150

Page 16: Data representation

MIME

or

Content-Transfer-Encoding: base64

Content-Type: text/plain; charset=ISO-8859-15

MIME-Version: 1.0

ozEwMCBpcyBhYm91dCCkMTUwCg=49

Page 17: Data representation

Data Compression

Page 18: Data representation

Data Compression

Data compression is the art of reducing the number of bits needed to store or transmit data.

Compression can be either lossless or lossy.

Page 19: Data representation

Lossless Compression – involve no loss of information. If data have been losslessly compressed, the original data can be recovered exactly from compress data. It is generally used for application that cannot tolerate any difference between original and reconstructed data.

Lossy Compression – involve some loss of information and data have been compressed using lossy techniques generally cannot be recovered or reconstructed exactly. In return for accepting this distortion in reconstruction, can generally obtain much higher compression ratios than is possible with lossless compression.

Page 20: Data representation

Steps of Data Compression

The compression of still images, audio and video data streams:

1. Picture preparation – generates an appropriate digital representation of the information in the medium being compressed.

2. Picture processing –is the first step that makes use of the various compression algorithms.

3. Quantization – Values determined in the previous step cannot and should not be processed with full exactness; instead they are quantized according to a specific resolution and characteristic curve.

4. Entropy encoding – with a sequential data stream of individual bits and bytes, different techniques are used to perform a final, lossless compression.

Page 21: Data representation

Steps of Data Compression

Major steps of image compression, can also be applied to audio and video data

Uncompressed Picture

Picture Preparation

Picture Processing Quantization

Entropy Coding

Compressed Picture

Page 22: Data representation

Image Compression

to represent images with less data in order to save storage costs or transmission time.

possible to reduce file size to 10% from the original without noticeable loss in quality.

Image compression can be lossless or lossy.

Page 23: Data representation

Image Compression

Lossless

- Image quality is not reduced. Use in: artificial images that contain sharp-edged lines such as technical drawings, textual graphics, comics, maps or logos. Methods: run-length encoding (RLE), entropy coding (Huffman coding) and dictionary coders (LZW).

Page 24: Data representation

Image Compression

Lossy - reduces image quality. Cannot get the original image back & lose some information. Use in: natural images such as photos of landscapesMethods: discrete cosine transform (DCT, used in JPEG) or wavelet transform (used in JPEG 2000), color quantization

Page 25: Data representation

FORMAT FILE EXTENTION

TYPE OF COMPRESSIO

N

METHODS USAGE

BMP (bitmap) .bmp Cosiderably compressed with lossless

ZIP used to store bitmap digital images

JPEG (Joint Photographic Experts Group)

.jpg , .jpeg , .jpe Lossy

Lossless

- Discrete Cosine Transform (DCT) & Chroma Subsampling- Run-Length Encoding (RLE)

For natural images

GIF (Graphics Interchange Format

.gif , .giff , .gfa Lossless LZW (Lempel-Ziv-Welch)

For artificial images (sharp-edge lines and few colors) & support animation

PNG (Portable Network Graphics)

.png Lossless DEFLATE Better compression & features than GIF, but don’t support animation

TIFF (Tagged Image File Format)

.tiff , .tif Lossless RLE / LZW / DEFLATE / ZIP

Flexible file format, can store multiple images in a single file

JPEG2000 jp2, .j2c, jpc, j2k, jpx

Lossy & Lossless Discrete Wavelet Transform (DWT)

Better image quality than JPEG (up to 20%), not widely used because of some patent issues.

Comparison of graphics file formats

Page 26: Data representation

Block Diagram of JPEG Compression

Transformation coding

performed using the Discrete

Cosine Transform (DCT)

Quantization of all DCT

coefficients ( a lossy process)

Huffman coding and arithmetic

coding as entropy encoding

methods

Sourceimage

JPEG compression

DCT Quantization EncodingCompressed

image

Page 27: Data representation

Audio Compression

A form of data compression designed to reduce the size of audio files

Audio compression can be lossless or lossy

Audio compression algorithms are typically referred to as audio codecs.

Page 28: Data representation

Audio Compression

Lossless - allows one to preserve an exact copy of one's audio filesUsage: For archival purposes, editing, audio quality.Codecs: Free Lossless Audio Codec (FLAC)

Apple Lossless

MPEG-4 ALS

Monkey's Audio

Lossless Predictive Audio Compression (LPAC)

Lossless Transform Audio Compression (LTAC)

Page 29: Data representation

Audio Compression

Lossy - irreversible changes , achieves far greater compression, use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system.Usage: distribution of streaming audio, or interactive applications Codecs: MP2- MPEG-1Layer 2 audio codec

MP3 – MPEG-1 Layer 3 audio codec

MPC Musepack

Vorbis Ogg Vorbis

AAC Advanced Audio Coding (MPEG-2 and MPEG-4)

WMA Windows Media Audio

AC3 AC-3 or Dolby Digital A/52

Page 30: Data representation

Moving Picture Expert Group(MPEG)

MPEG is an ISO/IEC working group, established in 1988 to develop standards for digital audio and video formats.

MPEG-1Designed for up to 1.5 Mbit/secStandard for the compression of moving pictures and audio. Most popular is level 3 of MPEG-1 (MP3). MPEG-1 is the standard of compression for VideoCD.

MPEG-2Designed for between 1.5 and 15 Mbit/secStandard on which Digital Television set top boxes and DVD compression is based. Designed for the compression and transmission of digital broadcast television

Page 31: Data representation

MPEG (cont.)

• MPEG-4

Integrates several different audio components into one standard: speech compression, perceptually based coders, text-to-speech, and MIDI. MPEG-4 AAC (Advanced Audio Coding), is similar to the MPEG-2 AAC standard, with some minor changes

MPEG-7 (under development) - also called the Multimedia Content Description Interface. In terms of audio:facilitate the representation and search for sound content. Example application supported by MPEG-7: automatic speech recognition (ASR).

Page 32: Data representation

MPEG Audio Encoding

Uncompressed Audio Signal

Division in 32 Frequency Bands

Psychoacoustic Model

Quantization

(if applicable) Entropy Encoding

Compressed Audio Data

controls

Page 33: Data representation

Audio Compression

Page 34: Data representation

Audio Compression Format-MP3

Played by almost every portable digital audio device and many DVD players, MP3 is still hard to go past if looking for maximum compatibility for your files.

can get much better compression from other formats, hard disks and blank CDs are cheap enough to justify the extra file size.

Stereo imaging is not terrific and encoding quality differs from one software package to another.

Compression: 5.

Quality: 7.

Compatibility: 10.

Overall: 7.5.

Page 35: Data representation

Audio Compression Format-WMA

Window's Media Audio is Microsoft's contribution to high quality, lossy audio compression. Like most other new formats, it outperforms MP3 in terms of quality and compression, particularly at lower bitrates.

WMA is probably the format of choice for streaming at low bandwidths. Like MP3, however, the stereo imaging is not very accurate.

WMA tends to overcompensate for its high compression with what is often called 'overbrightness'.

Compression: 8.

Quality: 7.

Compatibility: 9.

Overall: 8.

Page 36: Data representation

Audio Compression Format- Ogg Vorbis

project attempting to replace all proprietary audio formats with an open standard freeware codec. Version one was released in this past fortnight and has been demonstrated to be very high quality and outperforms MP3 by a long shot.

At low bitrates it doesn't compete with WMA, and at high bitrates it falls short of MPC. Given that it is a work in progress, however, it has strong potential to become a widely used audio codec.

Some portable device manufacturers are promising to support Ogg Vorbis in future software releases.

Compression: 8.

Quality: 7.

Compatibility: 6.

Overall: 7.

Page 37: Data representation

Video Compression

Storing and transmitting uncompressed raw video is not an efficient technique because it needs large amounts of storage and bandwidth.

DVD, DSS, and internet video, all use digital data → take a lot of space to store and large bandwidth to transmit.

Video compression technique is used to compress the data for these applications → less storage space and less bandwidth to transmit data.

Page 38: Data representation

Video Compression

Videos are sequences of images displayed at a high rate. Each of these images is called a frame.

Human eye can not notice small changes in the frames such as a slight difference in color.

Therefore, video compression standards do not require the encoding of all the details and some of the less important video details are lost. This is because lossy compression is used due to its ability to get very high compression ratios.

Typically 30 frames are displayed on the screen every second. 

Page 39: Data representation

Video Compression Process

1. Start by encoding the first frame using a still image compression method.

2. It should then encode each successive frame by identifying the differences between the frame and its predecessor, and encoding these differences. If the frame is very different from its predecessor it should be coded independently of any other frame.

3. In the video compression literature, a frame that is coded using its predecessor is called inter frame (or just inter), while a frame that is coded independently is called intra frame (or just intra).

Page 40: Data representation

Video Compression Techniques

Flow Control and Buffering

Temporal Compression

Spatial Compression 

Discrete Cosine Transform (DCT)

Vector Quantization (VQ)

Fractal Compression

Discrete Wavelet Transform (DWT).

Page 41: Data representation

Video Compression Formats

The ISO/IEC, or International Organization for Standardization and the International Electrotechnical Commission, have a group called the Moving Pictures Experts Group or MPEG. MPEG is responsible, for the familiar compression formats MPEG-1, MPEG-2 and MPEG-4

The ITU-T standardizes formats for the International Telecommunications Union, a United Nations Organization. Some popular ITU-T compression formats include the H.261 and H.264 formats.

There are other compression formats, such as Intel Indeo and RealVideo (based on the ITU-T H.263 codec), AVI, DivX, Quicktime, Windows Media Video (WMV).

Page 42: Data representation

Encryption

Page 43: Data representation

Encryption

• To carry sensitive information, a system must be able to assure privacy.

• As the number of attacks increase and as the public Internet is used to transmit private data, it is increasingly difficult to protect information.

• One way to safeguard data from attacks is encrypting the data.

• Practically, encryption is suitably done in presentation layer besides transport and physical layer.

Page 44: Data representation

Encryption

Encryption –  the conversion of data into a form, called a ciphertext, that cannot be easily understood by unauthorized people.

Decryption – the process of converting encrypted data back into its original form, so it can be understood.

Page 45: Data representation

Example of Encryption / Decryption Process

Page 46: Data representation

Basic Terms and Concepts

Cryptography – The science of encrypting or hiding secrets

Cryptosystem – a disguises message that allows only selected people to see through the disguise.

Cryptanalysis – The science of decrypting messages or breaking codes and ciphers

Key – a value that is used by an algorithm to encrypt and decrypt a message.

Cipher – an encryption/decryption algorithm tool that is used to create encrypted/decrypted text

Page 47: Data representation

Encrytption/Decryption Keys

Symmetric Keys – Also called secret key encryption. It uses a single key to encrypt and decrypt the message. This means the person encrypting the message must give that key to the recipient before they can decrypt it. Eg.: Data Encryption Standard (DES), Triple DES (3DES), Advanced Encryption Standard (AES)

Page 48: Data representation

Asymmetric Keys -Also called public key encryption. It uses two different keys which is public key to encrypt the message, and a private key to decrypt it. The public key can only be used to encrypt the message and the private key can only be used to decrypt it.

Page 49: Data representation

How Encryption Protects

Confidentiality - Allow only authorized users to access information.

Authentication - Verify who the sender was and trust the sender is who they claim to be.

Integrity - Trust the information has not been altered

Nonrepudiation - Ensure that the sender or receiver cannot deny that a message was sent or received.

Access Control - Restrict availability to information.

Page 50: Data representation

Advantages of Encryption

file is encrypted then the device that uses it doesn’t need to be secure which means that because the data is encrypted and secure that the means of storage or transportation of it doesn’t need to be securing which saves you money on extra protection software.

having the data encrypted it takes away the pain and worry that is associated with data breaches and the protection of intellectual property.

the advantage of Encryption is that it keeps data from snoopers without compromising systems or storage devices.

Page 51: Data representation

Disadvantages of Encryption complexity of computer encryption, the usually, expensive cost,

the ability for it to be easily changed and its inability to organize the data has been encoded. Even though the data doesn’t need to be protected anymore because of the encryption, but instead it puts a lot of pressure on IT employees.

If you forget your passphrase and/or keyfile then there is almost no chance of recovering your data

takes a lot of processing, energy and computer power as well.  This means that even though data is protect the overall performance of the computer could drop.

encryption won’t prevent hackers or viruses and it also may make it hard to use the encrypted file as some restrictions may have been placed on it.