aptcha - ucy · 2020. 4. 28. · xrumer • software for spamming, mostly forums and comment...

84
APTCHA I am Andreas Charalampous, April 2020 CS682 - Advanced Security Topics Instructor: Elias Athanasopoulos

Upload: others

Post on 09-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

APTCHA

I am Andreas Charalampous, April 2020

CS682 - Advanced Security Topics

Instructor: Elias Athanasopoulos

Page 2: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Contents

1. Introduction to Captcha

2. Paper 1: Re: Captchas – Understanding Captcha-Solving Services in an economic context

3. Paper 2: I am Robot: (DEEP) Learning to break Semantic Image Captchas

Page 3: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

1. Introduction to Captcha

i. Motivation

ii. Definition

iii.Type of Captcha Challenges

iv. reCaptcha

Page 4: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Motivation

• Using computers for bot fraud, attackers can attack at scale.

• Fake Registrations - Create multiple accounts automatically.

• Comment/Posting Spam.

• Purchase of tickets.

• Resource that has to be guarded.

• A defense mechanism is needed to distinguish computers and humans, let humans in and spammers out of resources.

Page 5: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Definition of Captcha

• Captcha: Completely Automated Public Turing test to tell Computers and Humans Apart.

• Reverse Turing Test.

• Term coined by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford in 2003.

• Captchas protect open Web Resources from being exploited at scale.

• Challenge-Response to determine whether the user is human or not.

• A Captcha challenge must at the same time make the bot fail and the human easily solve it.

• Approximately 10 seconds for a human to solve a typical Captcha.

Page 6: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Type of Captcha Challenges

• First version of Captcha (v.1) is the “twisted text”, made in 1997.

• Earliest commercial use by idrive.com and Paypal in 2002 and 2001 respectively.

• Math problems captchas.

• Audio captchas.

• Picture captchas.

Page 7: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Type of Captcha Challenges

Advertisement Captcha

Game CaptchaSlideLock Captcha

Drag-And-Drop CaptchaTrivial Captcha

Page 8: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptcha

• Was developed by Luis von Ahn, David Abraham, Manuel Blum, Michael Crawford, Ben Maurer, Colin McMillen, and Edison in May 2007.

• It was acquired by Google in September 2009.

• Used for digitization of The New York Times archives and books from Google Books.

• Two of the reCaptcha challenges are image and distorted text identification.

Page 9: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

No Captcha ReCaptcha

• Developed in 2014.

• Consists of a checkbox where the user is asked to just click it.

• Performs behavioral analysis on the browser predicting if the user is human or not.

• Easier for humans.

• “Harder” for bots.

Page 10: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Evolution and Variety in Captchas

• Captchas are evolving for more than 20 years and will keep on doing. Many different kinds of captcha challenges.

• Are improved, finding ways to make it easier to users, more difficult to bots.

• Provide accessibility to health impaired users.

• Captchas are kept being bypassed by automation software or solver services, creating an arms race between solvers and providers.

Page 11: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including
Page 12: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

2. Paper 1: Re: Captchas – Understanding Captcha-Solving Services in an economic context

i. Introduction

ii. What is examined in this paper

iii. Automated Software Solvers

iv. Human Solver Services

v. Conclusion

Page 13: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

• Captchas attached value to the problem of solving them, creating an industrial market, where captcha providers and solver are competing.

• Providers come against two types of solvers:

• Automated solving technology.

• Real time Human Labor .

• Captchas are evaluated in economic terms.

Introduction

Page 14: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

What is examined in the paper?• How this new market works

• Serving quality to price.

• Solving capacity of the market leaders.

• Details about solving services.

• How the two categories of solvers work:

• Automated solving:

• How it evolved.

• How the arms race favors the providers (defender).

• Human Labor:

• Why it surpassed automated solving.

• How the cost of it dropped significantly.

• Which Captchas are targeted most.

Page 15: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

To support further the study

• Interviewed Mr. E., owner of a successful CAPTCHA-solving service. He provided validation and insight of the underlying business processes.

• Studied the whole market, from all aspects and view.

• Purchased solving services from both categories and tested them.

• Became part of the human labor pool.

Page 16: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Software Solvers

• Use segmentation algorithms – Optical Character Recognition (OCR)

• Complex.

• Fails to replicate human accuracy.

• Advantages:

• Near-zero cost. Only cost is in creating solver.

• Near-infinite capacity.

• Tested Xrumer and reCaptchaOCR.

Page 17: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Xrumer

• Software for spamming, mostly forums and comment sections.

• Integrated support for bypassing many different anti-spam mechanisms, including Captcha.

• Available from 2006 and in 2010 it cost $540. Authors purchased it for evaluation.

• In 2008 was capable of solving Captchas of major message boards.

Page 18: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Xrumer Tests

• Tested on netbook with 1.6Ghz Intel Atom Processor.

• On all but one captchas scored 100% accuracy, requiring 1 second or less for each Captcha.

• Only on phpBB which uses GD Captcha generator and foreground noise, scored 35% accuracy, requiring 6-7 seconds per captcha.

• Even though the scores are pretty impressive, a couple of months later theses captchas were updated, defeating Xrumer.

Page 19: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptchaOCR

• Created in December 2009.

• Focused on reCaptcha.

• Developed to defeat early 2008 reCaptchas.

• Was able to defeat late 2009 reCaptchas.

• Early 2010 reCaptcha was updated and reCaptchaOCR was unable to defeat it.

(a) Early 2008

(b) Late 2009

(c) Early 2010

Page 20: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptchaOCR Tests

• Tested on netbook with 2.13Ghz Intel Core 2 Duo Processor.

• Uses iteration for improving accuracy.

• With 613 iterations:

• 100 (a) captchas scoring 30%.

• 100 (b) captchas scoring 18%.

• Average 105 seconds per challenge.

• With 75 iterations:

• 100 (a) captchas scoring 29%.

• 100 (b) captchas scoring 17%.

• Average 12 seconds per challenge.

Page 21: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Conclusion

• Arms races traditionally favor the attacker. Here attackers have the more challenging recognition problem, while providers can be agile.

• Economics of automated solving are driven by several factors:

• Cost of developing new solvers.

• Accuracy of those solvers.

• Responsiveness of the sites whose captchas are attacked.

Page 22: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Human Solver Services

• Instead of using automated solving software, the workload of captchas is given to humans to solve.

• Opportunistically.

• On a “For a Hire” Basis.

Page 23: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Opportunistic Solving

• Individual solving a Captcha as part of some other task.

• An attacker controlling a popular Website, might use its visitors for solving third-party Captchas by offering them as the visitor’s challenge.

• Did not play a major role in the market.

Page 24: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Paid Solving

• Core of the CAPTCHA-solving ecosystem.

• Services are paying individuals to solve captchas.

• Price is calculated as $X/1000, where X is the amount paid for solving 1000 Captchas.

• An advertisement in 2006 was looking for a full-time CAPTCHA solver for $10/1000.

Page 25: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

decaptcher.com

DeCaptcherPixProfitPictures are life

demenoba

1

2 3

4

5

67

8

Workers all around the world

Page 26: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Paid Solving Evolution

• From 2007 to 2010 the market has been expanding with wages declining.

• 2007: $10/1000.

• Mid-2008: $1.5/1000.

• Mid-2009: $1/1000.

• 2010: $0.75/1000 – $0.5/1000.

• Solving is unskilled activity.

• Services preferred labor from Eastern Europe, Bangladesh, China, India, Vietnam.

• Competition made wages reduce even more.

Page 27: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Solver Service Quality

• Evaluate 8 Paid Services:

• Antigate https://anti-captcha.com/

• BeatCaptchas https://beatcaptchas.com.cutestat.com/

• BypassCaptcha http://bypasscaptcha.com/

• CaptchaBot http://www.captchabot.com/

• CaptchaBypass – Ceased Operation during evaluation

• CaptchaGateway – Ceased Operation during evaluation

• DeCaptcher https://de-captcher.com/

• ImageToText – Ceased Operation

• Based on:

1. Customer Interface

2. Solution Accuracy

3. Response time

4. Capacity

5. Load and Availability

Page 28: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Verifying Results

• For each captcha, the most frequent solution from solvers is used.

• If there are more frequent solutions, the answers are incorrect.

• Heuristic Evaluation:

• 1025 random selected captchas that had at least one solution and checked manually.

• 1009 correct.

• 16 incorrect

• 6 of them because of characters similarities (zero vs O (0 – o), six versus letter B (6 – b))

Page 29: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Customer Account Creation

• All of them required prepayment.

• Antigate and Decaptcher, offer bidding systems for higher priority access when load is high.

• For most services, account registration is accomplished via Web and email.

• Some of them presented obstacles during registration:

• CaptchaBot and Antigate required third-party invitation codes.

• Antigate guards against Western users and required the name of Prime Minister in Cyrillic.

• Some of them, like ImageToText, required live phone call.

Page 30: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Evaluation Details

• Tested as customer for about five months using captchas from 25 popular sites, some of them including PayPal, eBay, Google etc.

• Submitted a single Captcha every five minutes to all services, recording the time submitted.

Page 31: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

1. Customer Interface

• Most provide an API package for uploading Captchas and receiving results.

• Two ways when interacting with the services:

• API performs HTTP Post that uploads the image and waits for the result in HTTP response: BeatCaptcha, BypassCaptcha, CaptchaBypass and CaptchaBot.

• API performs one HTTP POST to upload the image, receives an image ID in the HTTP response and polls the site for the solution using the ID: Antigate, CaptchaGateway, ImageToText.

Page 32: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

2. Solution Accuracy

Error rate for each combination of service and CAPTCHA type

Page 33: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

2. Solution Accuracy

Median error rate for all services

Median error rate for all CAPTCHAs

Page 34: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

3. Response Time

Median Response Time for every service

Median Response Time for all Captchas

Page 35: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

3. Response Time

Response time for each combination of service and CAPTCHA type

Page 36: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

4. Capacity

• Number of captchas solved in given time.

• Increase number of load until service is overloaded.

• Antigate has the best capacity, 27 to 41 captchas per second.

• 1,536 threads submitting with bid set 3/1000.

• Rejection rate very low.

• Around 400-500 workers for their requests, the number may be larger.

• DeCaptcher and CaptchaBypass sustained 14-15 captchas per second

• BeatCaptchas 8 and BypassCaptcha 4 captchas per second.

Page 37: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

5. Load and Availability

• Customers can poll services for load reports.

• Examine how workers get affected by load.

Load per hour reported by Antigate (Left) and DeCaptcher (Right)

Page 38: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Workforce

• Examine solving services from the solver’s (worker) aspect.

• Export Demographic conclusions about the solvers, like origin.

• Evaluates solvers adaptability.

• Identify the most targeted sites.

• Test two job sites:

• Kolotibablo http://kototibablo.com/

• PixProfit http://pixprofit.com/

Page 39: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Worker Interface

• First an account is needed to be created.

• Web based Interface.

Page 40: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Worker Wages

• PixProfit: $1/1000

• Kolotibablo: $0.5/1000 - $0.75/1000

• Provides list for the top 100 solvers per day.

• Average payout for 1 December 2009: $106.31

• Average payout for 1 January 2010: $47.32

Page 41: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Geo locating Workers

• Crafted captchas using words from various languages to reveal geographic demographics of solvers.

• Captchas were showing number written in different languages.

• Instructions were in the same language.

• Language Varieties:

• Prevalence of Web Native speakers: English, Chinese, Hindi.

• Regions with low-cost labor markets: India, China, Latin America.

• Developed Regions: Western Europe.

• Synthetic language: Klingon.

Page 42: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Geo locating Workers

Accuracy of each service on different language captchas

Page 43: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Adaptability

• Examined how services and solvers adapt to changes.

• Sent image captchas where solvers had to identify cats and dogs.

• Sent one captchas every 3 minutes to all services, for 12 days.

• ImageToText had average 39.9% success.

• BeatCaptchas had average 20.4% success.

• The rest had success below 7%.

Error Rate of ImageToText on image captchas

Page 44: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Targeted Sites

• Identified the targeted sites of Kolotibablo and PixProfit.

1. For 82 days, gathered from Kolotibablo and PixProfit 25K and 28K captchas respectively.

2. Grouped them by image dimensions.

3. Manually tried to identify sites with same dimensions.

Page 45: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Conclusion

• Quality of captchas made easy to outsource to the global unskilled labor market.

• There is a whole highly competitive business market for solving captchas, following different models.

• Do Captchas work:

1. Telling computer and humans apart: Succeeded

2. Preventing automated site access: Failed

3. Limiting automated site access: Debatable

Page 46: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including
Page 47: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

3. Paper 2: I am Robot: (DEEP) Learning to break Semantic Image Captchas

i. What is examined in this paper?

ii. reCaptcha Analyzed

iii. System Overview

iv. Automated Solving Image reCaptcha

v. Influencing Advanced Risk Analysis System

vi. Guidelines and Countermeasures

Page 48: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

What is examined in this paper?

• Explore Google’s Advance Risk Analysis System (ARAS) used on the latest version of reCaptcha.

• How it works.

• Flaws.

• Methods to influence it.

• Design of novel low-cost attack using deep learning technologies for the image reCaptcha.

• Introduce new safeguards and modifications for preventing the manipulation of ARAS and mitigating attacks on image reCaptcha.

Page 49: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptcha Analyzed

• ReCaptcha is the most widely used captcha service.

• 200 million reCaptchas are solved every day.

• Many captchas deter valid users from visiting a website.

• Automated solvers are less lucrative than human solvers.

• The motivation is to make challenges easier for valid users and at the same time harder for frauds, human or automated.

• Advanced Risk Analysis System:

• Acquires user information from Google tracking cookies and browser. Even when not logged in or in incognito.

• Based on the above, ARAS provides an easy, hard or no challenge at all to user.

Page 50: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptcha Workflow

• Site that protect resources with reCaptcha, contains a reCaptcha Widget.

• Widget collects information about the user’s browser and checks for automation kits.

1. The user clicks on the checkbox, and a request is sent to Google containing:

• Referrer

• Sitekey

• Cookie

• Information gathered by widget.

2. The above are checked by ARAS and a HTML frame, containing the corresponding challenge, is sent.

reCaptcha checkbox

Page 51: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptcha Workflow (cont.)

3. When the checkbox is clicked, HTML field recaptcha-token is populated with a token.

• If user is legitimate, token becomes valid by Google.

• If not, it is invalid until user solves challenge.

4. The token is then submitted to the site.

5. Website sends a verification request to Google.

6. Google sends a response, which is JSON object with a boolean field indicating if the verification was a success.

• If the verification fails, error codes offer more information.

• Solution must be provided in 55 seconds.

• If not, user clicks on checkbox again to get a new challenge.

Page 52: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

reCaptcha Challenges

No captcha reCaptcha

1.

2.

Image Recaptcha

3a.

Scanned words

3b.

Street view numbers

3c.

Distorted one-word

Distorted two-word

3d.

3e.

Fallback captcha

Page 53: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

System Overview

• Built on Selenium, specifically Mozilla Webdriver (Mozilla Firefox v.36)

• Functionality for locating specific HTML DOM elements.

• Features for executing JavaScript.

• Controllers for handling keyboard and mouse event.

• Easily saving and loading browser cookies.

• Has two main components:

1. Cookie Manager.

2. Recaptcha Breaker.

Page 54: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Cookie Manager

• Each cookie receives up to 8 checkbox per day.

• Around 63,000 cookies per day are required.

• Cookies are automatically created and trained on virtual machine, in order to be viewed as a user.

• System configured to perform specific humane actions:

• Mimicking diurnal cycle, with random resting intervals between actions.

• Google search certain terms and follow links.

• Open videos in Youtube.

Page 55: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Recaptcha Breaker

• Uses the cookies from manager.

1. Visits sites that employ reCaptchas.

2. The system locates the checkbox element through recaptcha-anchor and performs mouse click action.

3. In case of checkbox challenge, recaptcha-token is extracted.

• In case of image captcha, it is passed to another module.

Page 56: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Breaking the image captcha

• In case of image, a popup is created in goog-bubble-content element. Inside the popup there is an Iframe, with the challenge.

• To identify the challenge system looks for:

• rc-imageselect: image captchas

• rc-defaultchallenge-response-field:text captchas

• Image captcha

• hint: rc-imageselect-desc.

• Candidate Images: rc-imageselect-tile

• Verification Button: recaptcha-verify-button

Page 57: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Image Tags

• Goal: Using Image Annotation Module, get tags of candidate images that match the given hint.

• All extracted images are passed to an Image Annotation Module.

• Clarifai: 20 tags with confidence score.

• Alchemy: up to 8 tags with confidence score .

• TDL: 8 tags with confidence score .

• NeuralTalk: free-form description.

• Caffe: 10 labels, 5 with high score, 5 more specific with lower score.

• Also took advantage of Google Reverse Image Search (GRIS), in order to get description and page titles. Also if found, a better quality of image is obtained.

Page 58: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Tag Classifier

• Implemented tag classifier, which allows system to select images with similar content, in case tags do not match hint.

• Classifier guesses the content using a subset of the given tags.

Page 59: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

History Module

• Many images in captchas are repeated.

• A labelled dataset is created containing images and their tags.

• Each image’s hint is stored in hint_list.

Page 60: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptcha

• Each candidate image will be assigned to one of 3 sets: Select, Discard, Undecided.

• Initially all candidate images are placed in Undecided.

1. If hint is not provided, sample image is searched in labelled dataset to obtain one.

2. Information about all images are collected from GRIS.

3. Every candidate image is searched in labelled dataset of the history module.

• If found, compares their tag to hint and if found match, candidate image is placed in select set.

• If not found, hint_list is checked, and if found match, candidate image is placed in discard set.

Page 61: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptcha(cont.)

4. Image annotation processes all images and tags are assigned.

• If tags match the hint, the image is added in select.

• If it matches one of the tags in the hint_list, added in discard.

5. System picks from select set, if not enough, picks from undecided.

Page 62: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Influencing Advanced Risk Analysis System

• Different approaches to influence ARAS:

• Browsing History

• Google Account Usage

• Geo location

• Browser Checks:

1. Automation

2. User-Agent

3. Screen Resolution

4. Mouse

5. Cookie Reputation

6. Site restriction

7. Token Harvesting

Page 63: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationBrowsing History

• Quantify the minimum amount of browsing history needed in order to get a checkbox captcha.

• Multiple network connection setups.

• ToR connections, with exit nodes in USA.

• Result: ARAS is neutralized if the appended cookie is 9 days old, no matter the network connection.

• Even without browsing.

Page 64: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationGoogle Account Usage

• Tried various accounts, with different settings.

• Without phone verification.

• With verified phone.

• With alternative email from another provider.

• Result: With an account, no matter the setting, after 60 days ARAS gives a checkbox captcha.

• Conclusion: Is easier not to use an account at all.

Page 65: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationCookie Geolocation

• Used ToR to create cookies from different regions.

• Result: no restrictions on the location of IP of cookie creation.

Page 66: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationBrowser Checks

• Automation:

• WebDriver sets the webdriver attribute to TRUE if automation is found

• Manually set attribute to TRUE, using Javascript.

• Result: No difference, checkbox captcha provided.

• Screen Resolution:

• Tried various resolutions, from 1x1 to 4096 x 2160.

• Result: No difference, checkbox captcha provided.

Page 67: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationBrowser Checks

• User-Agent:

• User-Agent is compared to the Canvas Fingerprint for validity.

Page 68: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationBrowser Checks

• Mouse: Tried different behaviors to check if ARAS is affected.

• Timing of movements.

• Erratic Movement Patterns.

• Multiple clicks in widget and checkbox.

• Used getElementById().click() Javascript function, to simulate clicking without hovering.

• Result: None of the above had a negative effect.

Page 69: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationBrowser Checks

• Token Harvesting:

• Experimented if creating a large number of cookies from single IP is prohibited.

• Result: 63000 cookies per day without getting blocked. Only restriction was when triggering concurrent request.

• Selling token harvesting attacks for $2/1000, could make $104-110 daily, or even higher with multiple attacks.

Page 70: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationMaximum number of checkbox

• Identify how many checkbox captchas can solve in a day without being blocked.

Checkbox captchas obtained per minute

Page 71: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

ARAS Influence EvaluationOverall Evaluation

• ReCaptcha suffers from significant flaws and omissions.

• In an attempt to remove the burden for legitimate users, attacks were enabled.

• The checks performed, can be used to introduce more safeguards.

Page 72: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptcha Evaluation

• The image captcha breaking is evaluated based on these aspects:

• Solution flexibility: how many wrong answers are allowed.

• Image Repetition: at what rate challenges/images are repeated.

• Live Attack: Real attack results.

Page 73: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptchaSolution Flexibility

• Manually solved image challenges using different combination of correct and wrong selections.

• 74% of the image captchas had 2 correct images out of 9 candidates, the rest had 3-4.

• Based on these results, system was set to select 3 images.

Combinations of correct and wrong answers that pass image reCaptcha

Page 74: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptcha Image Repetition

• Searched challenges with identical MD5 values.

• From 700 captchas, found 6 pairs of identical challenge. In 2 different sites within two hours.

• Conclusion: Challenges are not created on-the-fly, but from a small pool of challenges.

• Searched images using perceptual hashes.

• Identified 1368 images identical in total.

• 358 different repeating images.

• Most repeated image, was found 92 times.

Page 75: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptchaAttack Simulation

Accuracy of simulated attack for different combinations of modules against the image

reCaptcha

Page 76: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Automated Solving Image reCaptchaLive Attack

• Used Clarifai.

• Labelled dataset: Manually labelled 3000 images from challenges, with a tag from hint_list.

• Run the attack on 2235 captchas, scoring 70.78% accuracy.

• Better results because of repetition (found 1515 sample images and 385 candidate in labelled dataset).

• Also found 4 pairs of identical challenges. Google doesn’t remove challenges even if completed correctly.

Page 77: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Live AttackTime and Hints Repetition

Cumulative distribution of time required for each step

Frequency and success rate for each type of hint

Page 78: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Facebook Captcha Attack

• Facebook uses captchas to prevent bots from sending suspicious URLs and spam.

• Resizes images dynamically in html, allowing access to high resolution versions.

• May have 2 to 10 correct images, 5 - 7 in most cases.

Page 79: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Facebook Captcha Attack

Attack accuracy against Facebook’s image captcha

Page 80: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Guidelines and Countermeasures

• Token Auctioning:

• Token verification api has an optional field comparing ip address of user that solved and the one that submitted the token. Should be mandatory.

• Risk Analysis:

• Account:

• Those that are not logged in, will have to solve the hardest challenge.

• Limit number of tokens per IP address.

• Cookie Reputation:

• Number of cookies that can be created within a time period, should be regulated.

• Browser Checks:

• Stricter approach and return no challenge if overtly suspicious.

• E.g. mismatch browser-user-agent.

Page 81: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Guidelines and Countermeasures (cont.)

• Image captcha attacks:

• Remove flexibility.

• Increase number and increase range of correct images.

• Repetition:

• When a challenge is shown, it should be removed from pool.

• Pool of challenges should be larger.

• Hint and Content:

• Hint should be removed

• Providers can make experiments to find problematic image categories for image annotation software.

• Content homogeneity:

• Populate challenges with filler images of the same category as solutions.

Page 82: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including

Guidelines and Countermeasures (cont.)

• Advanced Semantic Relations:

• Instead of similar objects, user could be asked to select semantically related objects.

• Adversarial Images:

• Altering a small number of pixels, the image are misclassified, but are same visually.

• Introducing noise:

• Experiment on random grid, with varying parameters.

• Grid reduces probability of retrieved higher resolution images.

• Images will have to be cleaned first, so computational cost is added.

Page 83: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including
Page 84: APTCHA - UCY · 2020. 4. 28. · Xrumer • Software for spamming, mostly forums and comment sections. • Integrated support for bypassing many different anti-spam mechanisms, including