development of system to measure strength of · pdf filedevelopment of system to measure...

29
16-Dec-12 1 16-Dec-12 1 Development of System to measure strength of CAPTCHA By Anjali A. Chandavale [email protected] 9881498695 Guide: Dr.A.M.Sapkal Professor (E&T/C),COEP,Pune

Upload: vankhuong

Post on 06-Mar-2018

228 views

Category:

Documents


1 download

TRANSCRIPT

16-Dec-12 1

16-Dec-121

Development of System to measure strength of CAPTCHA

By Anjali A. [email protected]

9881498695

Guide: Dr.A.M.SapkalProfessor (E&T/C),COEP,Pune

16-Dec-12 2

16-Dec-122

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart.

Introduction

16-Dec-12 3

16-Dec-123

Existing Tools Vs Proposed System

Sr.no CAPTCHA Sniper CAPTCHA Infinity TesserCAP 1.0 Proposed System

Objective Breaking and bypass the CAPTCHA

Breaking and bypass the CAPTCHA

Breaking the CAPTCHA

Break and measure strength of CAPTCHA

Cost 96 dollar 46 dollar Free Free

Human Intervention

Yes-Semi-automatic Automated Yes -complete 50 % Semi automated

Response time seconds seconds seconds

Limitation Can break only Disconnected CAPTCHA

Can break only Disconnected CAPTCHA

Can break only Disconnected CAPTCHA

Designed for Disconnected , Overlapped and

Connected CAPTCHA

Objectives� Attack on CAPTCHA

with variable length, connected characters.

� Attack on CAPTCHA having variations in color as noise.

Help to build more robust CAPTCHA but at same time maintaining human friendliness

To provide security to social relevance so as to avail free internet services.

4

16-Dec-12 5Fig. 2 CBM

Fig. 1 Proposed System

Preprocessing of image and noise Calculation

Segmentation of cleaned image and calculate length of TBC

Recognize segmented character and find response Time

Type and amount of noise, length of TBC, recognized character and its response time

Cleaned image with type

Segmented image

TBC image

16-Dec-12 6

16-Dec-12 7

������������� �������� �������� ������������������� ������������

����� �����������������

����������������������������������

Performance Analysis

����������

��

���

���

���

���

����

��� ��

��� �� ��� ��� ��� ��� ���

���� �� ������ ������� ������ �������

����������

���

���

����

����������

��� ��

��� �� ��� ��� ��� ��� ��� ���

!"�#��$��

�����$ %&' $�� (�"� ) ����

����������

��

���

���

���

���

����

��� ��

��� �� ��� ��� ��� ��� ��� ���

*����(�+�$��

���*��,� -��.�� ��..

'�������

The Text based CAPTCHA is vulnerable to proposed preprocessing, segmentation and CR attack.

16-Dec-12 8

Contribution� The Proposed CAPTCHA Breaker has following improved

features as compared to existing tools.� Determines strength measurement parameters.� Categorizes TBC.� Response Time is in ms. � Breaks Connected TBC.� Breaks variable length TBC.� Breaks TBC having combination of light background and

dark foreground of same color, a single character with multiple colors.

� Breaks various types of TBCs with feature to load an image of TBC directly.

Go back

16-Dec-12 9

Future Plan

� Breaking of segmentation resistant TBC.� Breaking of connected TBC image having variations in

thickness and width of characters.

� Design & Implement algorithm for measuring strength of TBC

� Analyze the performance of Developed System.

16-Dec-12

16-Dec-121010

16-Dec-12 11

16-Dec-1211

� Noise� The use of color� Clutters� Confusing characters

� Characters used in TBC� Character set� Character length� Recognition rate

� Response Time

16-Dec-12 12

Categorization of TBC

Hollow TBCChess Board TBC

Normal TBC

16-Dec-12 13

Characters used

(a) (b) ( c ) CAPTCHA having a) DisConnected b) Overlapped Characters (characters not at same level c) Connected

16-Dec-12 14

16-Dec-1214

Start

Binarization

Type?Chess

Line, dot removal

Discontinuity Removal

Stop

Hollow

Normal

Process for CAPTCHA Identifier

Convert TBC to Grey scale

ResultsFig. 3 Preprocessing Attack

16-Dec-12 15

16-Dec-1215

������������

����� ��������

�������������������������

�����������������

�����������������

Go Back

16-Dec-12 16Go Back

1. Calculate width and height of alternating color sections. 2. Calculate average measure using eqn. 1

(1)

Where K = no. of width sample taken. 3. Calculate deviation of each measure from the average measure using eqn.2

(2)

4. Finally, obtain ratio of Deviation and avg as shown in eqn.3

(3)

5. If the ratio obtained is equal with tolerance of 10% then dimensions are considered equal or nearly equal.

6. Repeat steps 2-5 for height. If it is successful for width as well as height then it can be concluded that these measures are of fairly equal sized cells and hence this CAPTCHA is Botdetect CAPTCHA else test fails and algorithm proceeds with hollow CAPTCHA check.

16-Dec-12 17

The algorithm for Hollow TBC identification is as follows: 1. The background of image is filled with black color using boundary fill algorithm. 2. Calculate no. of pixels inside the characters using eqn.1

(1)

Where total no. of pixels in image = height * width of image in pixels. X is no. of pixels filled during boundary fill. Y is no. black pixels initially in the binary image.

3. Calculate ratio of pixels inside the characters and total no. of pixels in the image as indicated in eqn.2

(2)

4. If calculated ratio satisfies the observation then the CAPTCHA is considered as Hollow, or else check fails and CAPTCHA is considered as CAPTCHA with clutter.

Go Back

16-Dec-12 18

Preprocessing

• ������������

• � ������������ ���������������������� � ��� ����

• � � ���� ��� ��������������������

• ��������� ����������� ��� ������������ ����

• � � ���� ��� �������� ������ ����

16-Dec-12 19

16-Dec-1219

12/16/201219

Original image Image cleaned by preprocessing attack

Fig. 4 Preprocessing output

16-Dec-12 20

16-Dec-1220

12/16/201220

Original image Image cleaned by preprocessing attack

Fig. 4 Preprocessing output

16-Dec-12 21

Segment sub image from top

Start

Segment Image into no. of sub mages

Segment sub image from bottom

Sub image is connected from Bottom ?

sub image = last character of image?

Sub image is connected from TOP

?

Sub image has > 1 character ?

Take next sub image

Stop

Results

Fig. 5 Segmentation Attack

16-Dec-12 22Fig. 6 Segmentation Output

16-Dec-12 23

The accuracy is based upon the numbers of characters in different images. For example; every image in I-Tax has 6 characters. If the algorithm can segment 40 characters from 10 images, then accuracy will be 40/ (10*6) = 0.66, or 66%.

Name of Algorithm

Principle Accuracy Type of characters

Limitation

K-means Segmenter[35]

Cluster based segmentation .

73% Both discontinuous and continuous character

Unstable and iterative process

Microsoft TBC segmenter[24]

Noise arcs are removed and candidate segments are identified by color filling

67% discontinuous and character having arc

Overlapped arc and connected

Skeletonization [21]

Skeleton of text with retaining sharpness, position, connectivity of images

58% discontinuous and overlapped

Distortion in images.

Nearest Neighbour

Euclidean distance.

79% discontinuous and continuous character

Time consuming

Proposed Algorithm

Projection value, Snake Game.

85% All Variation in thickness of characters in an image.

16-Dec-12 24

The accuracy is based upon the numbers of characters in different images. For example; every image in I-Tax has 6 characters. If the algorithm can segment 40 characters from 10 images, then accuracy will be 40/ (10*6) = 0.66, or 66%.

Comparison of Algorithms

0

20

40

60

80

100

ProposedAlogorithm

SnakeSegmentation

ProjectionBased

Type of Algorithm

Acc

urac

y in

%

Connected Characters

Disconnected & overlapped

Go Back

16-Dec-12 25

CR by digitized pattern with ANN

16-Dec-12 26

CR by digitized pattern with ANN� Calculate I and Matrix from binarized input character pattern

� Calculate Weight matrix

� Calculate Candidate Score

CR by digitized pattern with ANN (Contd…)� Calculate Ideal weight model score

� Calculate Recognition coefficient Q(K) as a ratio of Candidate score to Ideal weight model score for all learnt Kth characters.

� The character for which Q(K) has maximum value is input candidate pattern.

16-Dec-12 27

16-Dec-12 28

16-Dec-1228

[1] A. A. Chandavale , A.M. Sapkal “An Improved Adaptive Noise Reduction for Secured CAPTCHA” Fourth International Conference on Emerging Trends in Engineering & Technology ICETET 2011,pp 12-18,2011, published by IEEExplore

[2] A. A. Chandavale , A.M. Sapkal “Reduced process thinning algorithm for CAPTCHA strength measurement” , International Journal of Computer Science & Application Vol.1, No.1,pp 1-6,2011

[3] A. A. Chandavale , A.M. Sapkal “Algorithm for secured online authentification usingCAPTCHA”,ICETET10, Published by IEEE Computer Society,2010

[4] A. A. Chandavale , A.M. Sapkal and R. M. Jalnekar “A framework to analyze security of Text based CAPTCHA” Int. Journal of Forensics and Computer Application Vol.1, No.27 ,pp 127-133 ,2010

[5] A. A. Chandavale , A.M. Sapkal and R. M. Jalnekar “Algorithm to break Visual CAPTCHA”IEEEExplore ,ICET09 ,Published by IEEE Computer Society, pp 258-262 ,2009

16-Dec-12 29

References[1] J Yan and A S El Ahmad. “Breaking Visual CAPTCHAs with Naïve Pattern Recognition

Algorithms”, in IEEE Conference Proc. of the 23rd Annual Computer Security Applications Conference (ACSAC’07)

[2] K Chellapilla, K Larson, P Simard, M Czerwinski, “Computers beat humans at single character recognition in reading-based Human Interaction Proofs”, 2nd Conference on Email and Anti-Spam (CEAS), 2005.

[3] Shih-Yu Huang, Yeuan-Kuen Lee, Graeme Bell And Zhan-He Ou “A Projection-Based Segmentation Algorithm for Breaking MSN and YAHOO Captchas” Proceedings Of The World Congress On Engineering 2008 Vol I WCE 2008, July 2 - 4, 2008, London, U.K .

[4] Yi-Kai Chen, and Jhing-Fa Wang, “Segmentation of Single or Multiple Touching Handwritten Numeral String Using Background and Foreground Analysis”IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 22, No. 11, pp 1304-1317,2006

[5] J. Yan and A. S. E. Ahmad “A Low-cost Attack on a Microsoft CAPTCHA” Technical report, School of Computing Science, Newcastle University, UK, 2008.

[6] Richard G. Casey And Eric Lecolinet “ A Survey Of Methods And Stratehies In Character Segmentation” IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 18, No. 7, pp 690-706,1996