development of system to measure strength of · pdf filedevelopment of system to measure...
TRANSCRIPT
16-Dec-12 1
16-Dec-121
Development of System to measure strength of CAPTCHA
By Anjali A. [email protected]
9881498695
Guide: Dr.A.M.SapkalProfessor (E&T/C),COEP,Pune
16-Dec-12 2
16-Dec-122
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart.
Introduction
16-Dec-12 3
16-Dec-123
Existing Tools Vs Proposed System
Sr.no CAPTCHA Sniper CAPTCHA Infinity TesserCAP 1.0 Proposed System
Objective Breaking and bypass the CAPTCHA
Breaking and bypass the CAPTCHA
Breaking the CAPTCHA
Break and measure strength of CAPTCHA
Cost 96 dollar 46 dollar Free Free
Human Intervention
Yes-Semi-automatic Automated Yes -complete 50 % Semi automated
Response time seconds seconds seconds
Limitation Can break only Disconnected CAPTCHA
Can break only Disconnected CAPTCHA
Can break only Disconnected CAPTCHA
Designed for Disconnected , Overlapped and
Connected CAPTCHA
Objectives� Attack on CAPTCHA
with variable length, connected characters.
� Attack on CAPTCHA having variations in color as noise.
Help to build more robust CAPTCHA but at same time maintaining human friendliness
To provide security to social relevance so as to avail free internet services.
4
16-Dec-12 5Fig. 2 CBM
Fig. 1 Proposed System
Preprocessing of image and noise Calculation
Segmentation of cleaned image and calculate length of TBC
Recognize segmented character and find response Time
Type and amount of noise, length of TBC, recognized character and its response time
Cleaned image with type
Segmented image
TBC image
16-Dec-12 7
������������� �������� �������� ������������������� ������������
����� �����������������
����������������������������������
Performance Analysis
����������
��
���
���
���
���
����
��� ��
��� �� ��� ��� ��� ��� ���
���� �� ������ ������� ������ �������
����������
���
���
����
����������
��� ��
��� �� ��� ��� ��� ��� ��� ���
!"�#��$��
�����$ %&' $�� (�"� ) ����
����������
��
���
���
���
���
����
��� ��
��� �� ��� ��� ��� ��� ��� ���
*����(�+�$��
���*��,� -��.�� ��..
'�������
�
The Text based CAPTCHA is vulnerable to proposed preprocessing, segmentation and CR attack.
16-Dec-12 8
Contribution� The Proposed CAPTCHA Breaker has following improved
features as compared to existing tools.� Determines strength measurement parameters.� Categorizes TBC.� Response Time is in ms. � Breaks Connected TBC.� Breaks variable length TBC.� Breaks TBC having combination of light background and
dark foreground of same color, a single character with multiple colors.
� Breaks various types of TBCs with feature to load an image of TBC directly.
Go back
16-Dec-12 9
Future Plan
� Breaking of segmentation resistant TBC.� Breaking of connected TBC image having variations in
thickness and width of characters.
� Design & Implement algorithm for measuring strength of TBC
� Analyze the performance of Developed System.
16-Dec-12 11
16-Dec-1211
� Noise� The use of color� Clutters� Confusing characters
� Characters used in TBC� Character set� Character length� Recognition rate
� Response Time
16-Dec-12 13
Characters used
(a) (b) ( c ) CAPTCHA having a) DisConnected b) Overlapped Characters (characters not at same level c) Connected
16-Dec-12 14
16-Dec-1214
Start
Binarization
Type?Chess
Line, dot removal
Discontinuity Removal
Stop
Hollow
Normal
Process for CAPTCHA Identifier
Convert TBC to Grey scale
ResultsFig. 3 Preprocessing Attack
16-Dec-12 15
16-Dec-1215
������������
����� ��������
�������������������������
�����������������
�����������������
Go Back
16-Dec-12 16Go Back
1. Calculate width and height of alternating color sections. 2. Calculate average measure using eqn. 1
(1)
Where K = no. of width sample taken. 3. Calculate deviation of each measure from the average measure using eqn.2
(2)
4. Finally, obtain ratio of Deviation and avg as shown in eqn.3
(3)
5. If the ratio obtained is equal with tolerance of 10% then dimensions are considered equal or nearly equal.
6. Repeat steps 2-5 for height. If it is successful for width as well as height then it can be concluded that these measures are of fairly equal sized cells and hence this CAPTCHA is Botdetect CAPTCHA else test fails and algorithm proceeds with hollow CAPTCHA check.
16-Dec-12 17
The algorithm for Hollow TBC identification is as follows: 1. The background of image is filled with black color using boundary fill algorithm. 2. Calculate no. of pixels inside the characters using eqn.1
(1)
Where total no. of pixels in image = height * width of image in pixels. X is no. of pixels filled during boundary fill. Y is no. black pixels initially in the binary image.
3. Calculate ratio of pixels inside the characters and total no. of pixels in the image as indicated in eqn.2
(2)
4. If calculated ratio satisfies the observation then the CAPTCHA is considered as Hollow, or else check fails and CAPTCHA is considered as CAPTCHA with clutter.
Go Back
16-Dec-12 18
Preprocessing
• ������������
• � ������������ ���������������������� � ��� ����
• � � ���� ��� ��������������������
• ��������� ����������� ��� ������������ ����
• � � ���� ��� �������� ������ ����
16-Dec-12 19
16-Dec-1219
12/16/201219
Original image Image cleaned by preprocessing attack
Fig. 4 Preprocessing output
16-Dec-12 20
16-Dec-1220
12/16/201220
Original image Image cleaned by preprocessing attack
Fig. 4 Preprocessing output
16-Dec-12 21
Segment sub image from top
Start
Segment Image into no. of sub mages
Segment sub image from bottom
Sub image is connected from Bottom ?
sub image = last character of image?
Sub image is connected from TOP
?
Sub image has > 1 character ?
Take next sub image
Stop
Results
Fig. 5 Segmentation Attack
16-Dec-12 23
The accuracy is based upon the numbers of characters in different images. For example; every image in I-Tax has 6 characters. If the algorithm can segment 40 characters from 10 images, then accuracy will be 40/ (10*6) = 0.66, or 66%.
Name of Algorithm
Principle Accuracy Type of characters
Limitation
K-means Segmenter[35]
Cluster based segmentation .
73% Both discontinuous and continuous character
Unstable and iterative process
Microsoft TBC segmenter[24]
Noise arcs are removed and candidate segments are identified by color filling
67% discontinuous and character having arc
Overlapped arc and connected
Skeletonization [21]
Skeleton of text with retaining sharpness, position, connectivity of images
58% discontinuous and overlapped
Distortion in images.
Nearest Neighbour
Euclidean distance.
79% discontinuous and continuous character
Time consuming
Proposed Algorithm
Projection value, Snake Game.
85% All Variation in thickness of characters in an image.
16-Dec-12 24
The accuracy is based upon the numbers of characters in different images. For example; every image in I-Tax has 6 characters. If the algorithm can segment 40 characters from 10 images, then accuracy will be 40/ (10*6) = 0.66, or 66%.
Comparison of Algorithms
0
20
40
60
80
100
ProposedAlogorithm
SnakeSegmentation
ProjectionBased
Type of Algorithm
Acc
urac
y in
%
Connected Characters
Disconnected & overlapped
Go Back
16-Dec-12 26
CR by digitized pattern with ANN� Calculate I and Matrix from binarized input character pattern
� Calculate Weight matrix
� Calculate Candidate Score
CR by digitized pattern with ANN (Contd…)� Calculate Ideal weight model score
� Calculate Recognition coefficient Q(K) as a ratio of Candidate score to Ideal weight model score for all learnt Kth characters.
� The character for which Q(K) has maximum value is input candidate pattern.
16-Dec-12 27
16-Dec-12 28
16-Dec-1228
[1] A. A. Chandavale , A.M. Sapkal “An Improved Adaptive Noise Reduction for Secured CAPTCHA” Fourth International Conference on Emerging Trends in Engineering & Technology ICETET 2011,pp 12-18,2011, published by IEEExplore
[2] A. A. Chandavale , A.M. Sapkal “Reduced process thinning algorithm for CAPTCHA strength measurement” , International Journal of Computer Science & Application Vol.1, No.1,pp 1-6,2011
[3] A. A. Chandavale , A.M. Sapkal “Algorithm for secured online authentification usingCAPTCHA”,ICETET10, Published by IEEE Computer Society,2010
[4] A. A. Chandavale , A.M. Sapkal and R. M. Jalnekar “A framework to analyze security of Text based CAPTCHA” Int. Journal of Forensics and Computer Application Vol.1, No.27 ,pp 127-133 ,2010
[5] A. A. Chandavale , A.M. Sapkal and R. M. Jalnekar “Algorithm to break Visual CAPTCHA”IEEEExplore ,ICET09 ,Published by IEEE Computer Society, pp 258-262 ,2009
16-Dec-12 29
References[1] J Yan and A S El Ahmad. “Breaking Visual CAPTCHAs with Naïve Pattern Recognition
Algorithms”, in IEEE Conference Proc. of the 23rd Annual Computer Security Applications Conference (ACSAC’07)
[2] K Chellapilla, K Larson, P Simard, M Czerwinski, “Computers beat humans at single character recognition in reading-based Human Interaction Proofs”, 2nd Conference on Email and Anti-Spam (CEAS), 2005.
[3] Shih-Yu Huang, Yeuan-Kuen Lee, Graeme Bell And Zhan-He Ou “A Projection-Based Segmentation Algorithm for Breaking MSN and YAHOO Captchas” Proceedings Of The World Congress On Engineering 2008 Vol I WCE 2008, July 2 - 4, 2008, London, U.K .
[4] Yi-Kai Chen, and Jhing-Fa Wang, “Segmentation of Single or Multiple Touching Handwritten Numeral String Using Background and Foreground Analysis”IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 22, No. 11, pp 1304-1317,2006
[5] J. Yan and A. S. E. Ahmad “A Low-cost Attack on a Microsoft CAPTCHA” Technical report, School of Computing Science, Newcastle University, UK, 2008.
[6] Richard G. Casey And Eric Lecolinet “ A Survey Of Methods And Stratehies In Character Segmentation” IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 18, No. 7, pp 690-706,1996