applications of data hiding in digital images tutorial for the isspa’99, brisbane, australia...

Applications of Data Hiding in Digital Images

Tutorial for The ISSPA’99,Brisbane, AustraliaAugust 22-25, 1999

Center for Intelligent Systems SUNY Binghamton, Binghamton, NY 13902-6000, U.S.A,

andMission Research Corporation

1720 Randolph Rd. SE, Albuquerque, NM 87105, U.S.A

Jessica Fridrich

Fax/Ph: (607) 777-2577E-mail: [email protected]://www.ssie.binghamton.edu/fridrich

Outline

• Introduction to Data Hiding- History- Motivation- Definition

- Terminology- Properties

• Covert communication (steganography)• Digital watermarking (robust message embedding)• Watermarking for tamper detection and authentication• Attacks on hiding schemes• Open problems, challenges

Data Hiding in Digital Imagery

• Relatively very young and fast growing• Well over 90% of all publications published in the last 6 years • Highly multidisciplinary field combining image and signal processing with cryptography, communication theory, coding theory, signal compression, and the theory of visual perception• Tremendous interest from industry and military

Data Hiding - History

• First techniques included invisible ink, secret writing using chemicals, templates laid over text messages, microdots, changing letter/word/line/paragraph spacing, changing fonts• Images, video, and audio files provide sufficient redundancy for effective data hiding• Postscript files, PDF files, and HTML can also be used for non-robust data hiding to a limited extent• Executable files, provide very little space for data hiding• Fonts

The Need for Data Hiding

• Covert communication using images (secret message is hidden in a carrier image)• Ownership of digital images, authentication, copyright• Data integrity, fraud detection, self-correcting images• Traitor-tracing (fingerprinting video-tapes)• Adding captions to images, additional information, such as subtitles, to video, embedding subtitles or audio tracks to video (video-in-video)• Intelligent browsers, automatic copyright information, viewing a movie in a given rated version• Copy control (secondary protection for DVD)

Covert communication

Copyright protection of images (authentication)

Fingerprinting (traitor-tracing)

Adding captions to images, additional information,such as subtitles, to videos

Image integrity protection (fraud detection)

Copy control in DVD

Intelligent browsers, automatic copyright information, viewing movies in given rated version

Requirements

Low High

capacityrobustness

invisibilitysecurity

embedding complexitydetection complexity

Requirements Application

make data hiding possible

2 gray levels

5 gray levels

31 gray levels

Original

+

+

+

=

=

=

and• Information-theoretic• Removed by lossless compression

• Perceptual• Removed by lossy compression

• Relationship carrier - message

• Who extracts the message? (source versus destination coding)

• How many recipients are there?

• Is the key a public knowledge or a shared secret?

• Do we embed different messages into one carrier?

• Embedding / detection bundled with a key in a tamper-proof hardware?

• Is the speed of embedding / detection important?

Data Hiding - Definition

Secretmessage

Embeddingalgorithm

Carrierdocument

Transmissionvia network

Detector

Secretmessage

Key

Key

RobustnessThe ability to extract hidden information after common image processing operations: linear and nonlinear filters, lossy compression, contrast adjustment, recoloring, resampling, scaling, rotation, noise adding, cropping, printing / copying / scanning, D/A and A/D conversion, pixel permutation in small neighborhood, color quantization (as in palette images), skipping rows / columns, adding rows / columns, frame swapping, frame averaging (temporal averaging), etc.

UndetectabilityImpossibility to prove the presence of a hidden message. This concept is inherently tied to the statistical model of the carrier image. The ability to detect the presence doesnot automatically imply the ability to read the hidden message. Undetectability should not be mistaken for invisibility a concept related to human perception.

InvisibilityPerceptual transparency. This concept is based on the properties of the human visualsystem or the human audio system.

SecurityThe embedded information cannot be removed beyond reliable detection by targeted attacks based on a full knowledge of the embedding algorithm and the detector(except a secret key), and the knowledge of at least one carrier with hidden message.

Properties of hiding schemes

Undetectability Robustness

Capacity

The “Magic” Triangle

There is a trade-offbetween capacity,invisibility, and robustness

Secure steganographictechniques

Digital watermarking

• Complexity of embedding / extraction• Security

Additional factors:

Naïve steganography

Outline• Introduction• Covert communication (steganography) Message hiding in RGB images

- Absolutely secure steganographic method- LSB encoding

Message hiding in palette images- Permuting the palette- LSB encoding in the palette- EZ Stego- Improved EZ Stego

• Digital watermarking (robust message embedding)• Watermarking for tamper detection and authentication• Attacks on watermarks• Open problems, challenges

Covert CommunicationPurpose: To conceal the very presence of communication,

to make the communication invisible.Encryption: To make the message unintelligible

Secret communication??!!I just posted a picture of mycat on my web page!

WardenWillie

Andy Bob

Covert Communication

Secretmessage

EncryptionUnit

CarrierImage

EmbeddingAlgorithm

ModifiedCarrier

- Encryption and steganography provide double protection- Randomized message is easier to hide

Absolutely secure steganographic technique

Method:Embed a small message (8 bits), by repeated scanning of a cover image till a certain password-dependent message-digest function returns the required 8-tuple of bits.

Comments:• Absolute secrecy tantamount to one time pad used in cryptography• Guarantees correct noise distribution and undetectability.• Time consuming, very limited capacity, not applicable to image carriers for which we only have one copy.

Steganography for RGB images

Method:• Replace the LSB of each pixel with the secret message• Pixels may be chosen randomly according to a secret key• Pixels may be chosen adaptively according to neighborhood•Message should always be encrypted

Comments:• The simplest and most common steganographic technique• Premise = changes to the least significant bit will be masked by noise commonly present in digital images. • Color images provide more room for hiding messages• If more than one LSB is used, statistically detectable changes may result• A provably secure method should introduce changes consistent with the noise model

LSB Encoding (Least Significant Bit)

Steganography for RGB images

Steganography for palette images

LSB encoding cannot be directly applied to palette-based images because new colors, that are not present in the palette, would be created.

Two sources of palette images:1. Color truncation + dithering of photographs2. Computer generated images (fractals, cartoons, animations)

A secure steganographic method will produce modified carriers compatible with the source

Possibilities

Hiding in the paletteHiding in the image dataNon-adaptive techniquesAdaptive techniques

Palette artifactsImage data artifacts

Artifacts

Possible approaches

Permuting palette entries- Image is not modified- Very limited capacity of log2(256!)=215 bytes- Too fragile (resaving)- Suspicious palette order is an artifact

LSB encoding in the palette- Very limited capacity (at most 3256 bits)- Palette artifacts?

Message hiding in the palette

Common disadvantage: Capacity is severely limited and independent of the image size

Possible approachesMessage hiding in the image data - greedy techniques

Decrease color depth and expand 1. Collapse 256 colors 128 colors 2. Expand 128 colors 256 colors by including a close color (e.g., flip the LSB of the blue channel) 3. Embed a binary message into the LSB of the blue channel of randomly selected pixels 4. Read the message from the LSB of the blue channel

Alternatively 1. Decrease color depth to 32 colors and include all colors obtained from LSB shuffling of all 32 colors (one color produces 23 new colors) 2. Encode messages into the LSB of pixel colors

1 bpp

3 bpp

1. Assign parity to palette colors2. Embed message bits as the parity of colors

Possible approaches

Message hiding in the image data

Parity embedding

Message: 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1

Randomly chosen pixel with color

Find the color in the sorted palette

Sorted palette

Replace the LSB of the index to color C1 with the message bit

The new index now points to aneighboring color C2

Replace the index of the pixel in the original image to point to thenew color C2.

index = 30 = 00011110

00011110

00011111

C1

C1

C2

Critical assumption: Colors close in the luminance-sorted palette are also close in the color space.

(1) For each message bit randomly select a pixel(2) Calculate the set of the closest palette colors (in Euclidean norm) The distance d between colors (R1G1B1) and (R2G2B2) is

d 2 = (R1–R2)2+ (G1–G2)2+ (B1–B2)2

(3) Find the closest color whose parity agrees with the message bit. Parity of a color is defined as R+G+B mod 2. (4) Change the index for the pixel to point to the new color.

To extract the secret message, pixels are selected using a key and the secret message is simply read by extracting the parity bits of the colors of selected pixels.

New approach using color parities

Message hiding in the image data

1 bpp

Message: 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1

Randomly chosen pixel with color

Find the closest colors in the palette

C1

… …

Replace C1 with the closest color that has the sameparity as the message bit

Color parity of (R,G,B) = R+G+B mod 2.

Advantages over EZ Stego:• The total change to the image due to message embedding is always smaller• We avoid occasionally large changes in color that are possible with EZ Stego

Oblivious reading requirement:The optimal parity assignment has to be reconstructable from the modified image at the receiving end.

Optimal parity assignment

• Efficient algorithm for optimal parity assignment • Optimal parity depends only on the palette and does not depend on the image content!• The optimal palette is also optimal for multiple-pixel embedding

Optimal parity

embed

Optimal parity

message

Extractmessage

Modifiedcarrier= =

The average decrease in the RMS error due to optimal palette parityis about 25-35%.

Non-adaptive steganography = modifications due to message embedding are uncorrelated with image features. Examples are LSB encoding in randomly selected pixels, modulation of randomly selected frequency bins in a fixed band, etc.

Adaptive steganography = modifications are correlated with the image content (features).

- Pixels carrying message bits are selected adaptively depending on the image- Avoiding areas of uniform color- Selecting pixels with large local standard deviation

Potential problem with message recovery: We have to be able to extract the same set of message carrying pixels at the receiving end from the modified image.

Adaptive Steganography

• Large areas of uniform color• Internal structure of the image - it is a fractal Julia set• Fonts

ComputergeneratedJulia set

Artifacts around the Julia set. Artifacts in the fonts.

Artifacts caused by non-adaptive methods

• Divide the image into disjoint 33 blocks• Randomly choose blocks and evaluate some local statistical quantity, such as standard deviation or number of colors and decide whether or not a message bit can be embedded (good vs. bad block)• If block is bad, skip it and do not insert message bit• If block is good, insert the bit into the block parity• If after embedding the block becomes bad, keep the change but repeat the same message bit in the next block

Message embedding

Message extraction• Generate the same random walk through the image blocks• Read the parity from all good blocks

Method 1: Adaptive block embedding

Limitations

Ultimately, image understanding is important for secure adaptive steganography. A human can easily recognize that a pixel is actuallya dot above the letter "i" and must not be changed. However, it wouldbe very hard to write a computer program capable of making such intelligent decisions in all possible cases.

Example of a difficult area for secure adaptive message embedding - fonts on a complex background

Embedding while dithering

True-color images are converted to palette images via - color quantization- dithering

Idea: To embed message bits while doing the dithering

256 colorimage

True colorimage

Computepalette

Ditherand

EmbedIncrease color depth

by interpolating

Quantize

Or start directly withthe true-color image

Embedding while dithering

Palette P = {q1, …, q256}

E11 = q11 - p11

+

Original 24-bit image

Dithered quantized image

p11 p12+E11

Q Q

Non-message pixels: q is the closest palette colorMessage pixels: q is the closest palette color

with the right parity

Rounding error is added to the next pixel

Q:

q11 q12

1. Select a random collection of pixels that will carry message bits.2. For non-message pixels use classical dither to the closest palette color3. For message pixels dither to the closest color with the right parity.

Performance example

Test image in JPEG format Original Non-adaptive Embeddingwhile dithering

Outline

• Introduction, history, motivation, definition, terminology, properties• Covert communication (steganography)• Digital watermarking (robust message embedding)

- Copyright protection of digital images (authentication)- Fingerprinting (traitor-tracing)- Adding captions to images, additional information to videos- Methods for Robust Data Hiding (Watermarking)- Image integrity protection (fraud detection)- Copy control in DVD

• Watermarking for tamper detection and authentication •Attacks on watermarks• Open problems, challenges

applications of data hiding in digital images tutorial for the isspa’99, brisbane, australia...

Documents

data hiding history

applications of data

images secret message

effective data

video video

copyright data integrity

dvd slide

fonts images