perceptual video hashing based on the achlioptas’s random projections

PERCEPTUAL VIDEO HASHING BASED ON THE

ACHLIOPTAS’S RANDOM PROJECTIONS

Sandeep R.

Supervisor: Prof. P. K. Bora

Department of Electronics and Electrical Engineering,

Indian Institute of Technology Guwahati.

Alliance University Interview

19-March-2014

Alliance University Interview 19-March-2014 1 / 18

OUTLINE

1 Perceptual Hashing.

2 Prior Work and Motivation.

3 Theoretical Framework.

4 Proposed Algorithm.

5 Simulation Results.

6 Conclusions and Future Works.


PERCEPTUAL HASHING

Content-identification problem.

Challenge of finding the perceptually similar videos in a large

database.

Content-authentication problem.

Authenticating the video even though its digital representation is

different from the original.

Cryptographic hashing technique.

Perceptual hashing technique.


PERCEPTUAL HASHING (CONT’D)

Takes two inputs, a video V ∈ V and a secret key K ∈ K , toproduce a k-bit binary hash value h = H (V,K).

Properties of H (V,K) [Venkatesan 2000, Monga 2005]

1 One-way Function:

V 7→ H (V,K) .

2 Compactness:

Size(H (V,K))≪ Size(V ) .

3 Perceptual Robustness:

Pr (dH (H (V,K) ,H (Vsim,K))≤ τ1)≈ 1.

4 Visual Fragility:

Pr(

dH

(

H (V,K) ,H(

Vdi f f ,K))

> τ2

)

≈ 1.

5 Unpredictability:

Pr (dH (H (V,K1) ,H (V,K2))> τ3)≈ 1.


PRIOR WORK AND MOTIVATION

Prior Work.

Radial projection of key frames. [DeRoover 2005,Roover 2005]

3D-DCT [Coskun 2004,Coskun 2006]

Centroid of gradient orientations. [Lee 2006,Lee 2008]

Multilinear sub-space projections. [Li 2011,Li 2012]

Fast Johnson-Lindenstrauss transform (FJLT). [Lv 2008, Lv 2009]

Motivation.

Distance preserving property.

Computationally efficient.

Secured.

Essential features captured with minor distortion.


THEORETICAL FRAMEWORK

LEMMA (JOHNSON-LINDENSTRAUSS LEMMA [JOHNSON 1982])

A set of n points in d−dimensional Euclidean space can be mappedinto k−dimensional Euclidean space without distorting the distances

between any pair of points by more than a factor of (1± ε).

k is logarithmic in n and independent of d.k ≪ d.

(A) Points in d−dimension. (B) Points in k−dimension

FIGURE: Illustration of JL Lemma.Alliance University Interview 19-March-2014 6 / 18

ACHLIOPTAS’S RANDOM MATRIX (ARM)

Projection satisfies JL lemma.

RARM (i, j) =

{

+1 with probability 0.5

−1 with probability 0.5[Achlioptas 2001,Achlioptas 2003]

Projection involves only additions and subtractions.


PROPOSED ALGORITHM

FIGURE: Block diagram of the proposed video hashing algorithm

InputRGB video Vin, hash size k and algorithm parameters.

Preprocessing and normalizationConvert Vin into a sequence of gray-level frames, Vgray.

Normalize Vgray via temporal sub-sampling and spatial re-sizing to

obtain V of size 64×64×64.

RandomizationDepending on secret key K, randomly select N(32) overlapping 3D

sub-blocks Vi, i = 1,2, . . . ,N, each of size U ×V ×W .

Concatenate the columns of each sub-block Vi to form vector

fi ∈ Rd such that d =U ×V ×W .

Construct the feature matrix F ∈ Rd×N from fis to obtain

F = [f1 f2 . . . fN ].


PROPOSED ALGORITHM (CONT’D)

Random projections

Apply the ARM projection to F to get the hash matrix H = 1√kRF.

Hash computation

Generate the intermediate hash vector h′ =

N

∑i=1

hi

N , where hi is the ith

column of the matrix H.

Obtain the median ψ of h′.

Binarize the elements of h′ to obtain h = [h1 h2 . . .hk]

T where

h j =

{

0; h′j < ψ

1; h′j ≥ ψ, j = 1,2, . . . ,k.

Output

A hash vector h


SIMULATION RESULTS

Parameters Details

Number of videos 224 [xiph.org, reefvid.org]

Spatial resolution 192x144, 352x288, 384X288 and 512x288

Frame rates (in fps) 15, 25 and 30

Minimum number of frame 66

Maximum number of frame 1079


SIMULATION RESULTS (CONT’D)

(A) Frame rotation by 10◦. (B) Regular frame drop

FIGURE: ROC curves for single attack



(A) MPEG4 Compression with average CR30:1.

(B) Gaussian Blurring using a 5×5 masksize.

FIGURE: ROC curves for single attack



(A) Frame rate decreased to 15fps. (B) Contrast decreased by 10%.

FIGURE: ROC curves for single attack.



(A) AWGN with mean 0 and standarddeviation 60 and brightness increased by10%. (B) Regular frame dropping and 5◦rotation.

FIGURE: ROC curves for multiple attacks.


CONCLUSIONS AND FUTURE WORKS

Conclusions

Proposed a video hashing algorithm based on random projections

using Achlioptas’s random matrix.

Proposed algorithm showed excellent robustness to most of the

single and multiple image processing attacks.

Achlioptas’s random projection being computationally efficient

make the hash suitable for content-based video retrieval

application.

Future Works

Studying the performance of the proposed algorithm under

malicious attacks.

Comparing the performance of the proposed algorithm with that of

the other random projection based techniques like FJLT.


BIBLIOGRAPHY I

[1] R. Venkatesan, S.-M. Koon, M. Jakubowski, and P. Moulin, “Robust image hashing,” in ImageProcessing, 2000. Proceedings. 2000 International Conference on, vol. 3, 2000, pp. 664 –666vol.3.

[2] V. Monga, “Perceptually based methods for robust image hashing,” PHDTHESIS, TheUniversity of Texas at Austin, August 2005.

[3] C. De Roover, C. De Vleeschouwer, F. Lefebvre, and B. Macq, “Robust video hashing basedon radial projections of key frames,” Signal Processing, IEEE Transactions on, vol. 53, no. 10,pp. 4020 – 4037, oct. 2005.

[4] C. D. Roover, C. D. Vleeschouwer, F. Lefèbvre, and B. M. Macq, “Robust image hashing basedon radial variance of pixels.” in ICIP (3), 2005, pp. 77–80.

[5] B. Coskun and B. Sankur, “Robust video hash extraction,” in Signal Processing andCommunications Applications Conference, 2004. Proceedings of the IEEE 12th, april 2004,pp. 292 – 295.

[6] B. Coskun, B. Sankur, and N. Memon, “Spatio-temporal transform based video hashing,”Multimedia, IEEE Transactions on, vol. 8, no. 6, pp. 1190 –1208, dec. 2006.

[7] S. Lee and C. Yoo, “Video fingerprinting based on centroids of gradient orientations,” inAcoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEEInternational Conference on, vol. 2, may 2006, p. II.

[8] ——, “Robust video fingerprinting for content-based video identification,” Circuits and Systemsfor Video Technology, IEEE Transactions on, vol. 18, no. 7, pp. 983 –988, july 2008.


BIBLIOGRAPHY II

[9] M. Li and V. Monga, “Desynchronization resilient video fingerprinting via randomized, low-ranktensor approximations,” in Multimedia Signal Processing (MMSP), 2011 IEEE 13thInternational Workshop on, oct. 2011, pp. 1 –6.

[10] ——, “Robust video hashing via multilinear subspace projections,” Image Processing, IEEETransactions on, vol. 21, no. 10, pp. 4397 –4409, oct. 2012.

[11] X. Lv and Z. Wang, “Fast johnson-lindenstrauss transform for robust and secure imagehashing,” in Multimedia Signal Processing, 2008 IEEE 10th Workshop on, oct. 2008, pp. 725–729.

[12] X. Lv and Z. J. Wang, “An extended image hashing concept: content-based fingerprintingusing fjlt,” EURASIP J. Inf. Secur., vol. 2009, pp. 2:1–2:16, Jan. 2009.

[13] W. B. Johnson and J. Lindenstrauss, “Extensions of lipschitz mappings into a hilbert space,” inContemporary Mathematics, Proceeddings of the conference on Modern Analysis andProbability, A. B. Richard Beals, Anatole Beck and A. Hajian, Eds., vol. 26, 1984, pp. 189–206.

[14] D. Achlioptas, “Database-friendly random projections.” ACM Press, 2001, pp. 274–281.

[15] ——, “Database-friendly random projections: Johnson-lindenstrauss with binary coins,” J.Comput. Syst. Sci., vol. 66, no. 4, pp. 671–687, Jun. 2003.

[16] “http://media.xiph.org/video/derf/,” test video sequences.

[17] “http://www.reefvid.org/,” test video sequences.


THANKS TO ONE AND ALL.


perceptual video hashing based on the achlioptas’s random projections

Documents