![Page 1: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/1.jpg)
Breaking CAPTCHAs on the Dark WebUsing neural networks to enable scraping
RP #62, Kevin Csuka & Dirk Gaastra
Supervisor: Yonne de Bruijn, Fox-IT6 February, 2018
University of Amsterdam
![Page 2: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/2.jpg)
Introduction
![Page 3: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/3.jpg)
Scraping the Dark Web
Useful for threat intelligence companies
... sometimes hard to get to.
Mainly the blockades, such as CAPTCHAs, is an issue for the scrapers.
1
![Page 4: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/4.jpg)
Scraping the Dark Web
Useful for threat intelligence companies
... sometimes hard to get to.
Mainly the blockades, such as CAPTCHAs, is an issue for the scrapers.
1
![Page 5: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/5.jpg)
Scraping the Dark Web
Useful for threat intelligence companies
... sometimes hard to get to.
Mainly the blockades, such as CAPTCHAs, is an issue for the scrapers.
1
![Page 6: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/6.jpg)
CAPTCHA
Figure 1: CAPTCHA example
• Completely Automated Public Turing test to tell Computer andHumans Apart
• Test to determine whether the user is human or not
2
![Page 7: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/7.jpg)
CAPTCHA
Figure 1: CAPTCHA example
• Completely Automated Public Turing test to tell Computer andHumans Apart
• Test to determine whether the user is human or not
2
![Page 8: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/8.jpg)
Main question
How would a scraper be able to circumvent CAPTCHAs thatprevent it from properly scraping dark web websites?
Sub-questions:
1. Impact of solving CAPTCHAs2. Solve CAPTCHAs by using Optical Character Recognition (OCR)?3. Solving CAPTCHAs by using Machine Learning (ML)
3
![Page 9: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/9.jpg)
Main question
How would a scraper be able to circumvent CAPTCHAs thatprevent it from properly scraping dark web websites?
Sub-questions:
1. Impact of solving CAPTCHAs
2. Solve CAPTCHAs by using Optical Character Recognition (OCR)?3. Solving CAPTCHAs by using Machine Learning (ML)
3
![Page 10: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/10.jpg)
Main question
How would a scraper be able to circumvent CAPTCHAs thatprevent it from properly scraping dark web websites?
Sub-questions:
1. Impact of solving CAPTCHAs2. Solve CAPTCHAs by using Optical Character Recognition (OCR)?
3. Solving CAPTCHAs by using Machine Learning (ML)
3
![Page 11: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/11.jpg)
Main question
How would a scraper be able to circumvent CAPTCHAs thatprevent it from properly scraping dark web websites?
Sub-questions:
1. Impact of solving CAPTCHAs2. Solve CAPTCHAs by using Optical Character Recognition (OCR)?3. Solving CAPTCHAs by using Machine Learning (ML)
3
![Page 12: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/12.jpg)
Related Work
![Page 13: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/13.jpg)
Related Work
1. Lawrence et al. created their own dark web scraping tool, D-miner;CAPTCHAs were solved by human labor [1]
2. Ryan Mitchell demonstrated how to solve CAPTCHAs using OpticalCharacter Recognition with Tesseract [2]
3. Torch has previously been used to train a neural network to solveCAPTCHAs by Arun Patala [3]
4
![Page 14: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/14.jpg)
Related Work
1. Lawrence et al. created their own dark web scraping tool, D-miner;CAPTCHAs were solved by human labor [1]
2. Ryan Mitchell demonstrated how to solve CAPTCHAs using OpticalCharacter Recognition with Tesseract [2]
3. Torch has previously been used to train a neural network to solveCAPTCHAs by Arun Patala [3]
4
![Page 15: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/15.jpg)
Related Work
1. Lawrence et al. created their own dark web scraping tool, D-miner;CAPTCHAs were solved by human labor [1]
2. Ryan Mitchell demonstrated how to solve CAPTCHAs using OpticalCharacter Recognition with Tesseract [2]
3. Torch has previously been used to train a neural network to solveCAPTCHAs by Arun Patala [3]
4
![Page 16: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/16.jpg)
Methods
![Page 17: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/17.jpg)
Methods
Two methods to solve the questions:
1. Categorizing dark web websites2. Breaking CAPTCHAs
5
![Page 18: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/18.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?• Are there any duplicates?• Which ones block scraping?• What kind of blockade are they using?
6
![Page 19: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/19.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?• Are there any duplicates?• Which ones block scraping?• What kind of blockade are they using?
6
![Page 20: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/20.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?
• Are there any duplicates?• Which ones block scraping?• What kind of blockade are they using?
6
![Page 21: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/21.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?• Are there any duplicates?
• Which ones block scraping?• What kind of blockade are they using?
6
![Page 22: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/22.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?• Are there any duplicates?• Which ones block scraping?
• What kind of blockade are they using?
6
![Page 23: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/23.jpg)
1. Categorizing websites
Analysis of 633 dark web websites
• Which ones are up?• Are there any duplicates?• Which ones block scraping?• What kind of blockade are they using?
6
![Page 24: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/24.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor2. Exploiting bugs in the implementation that allow the attacker to
bypass the CAPTCHA3. Character recognition software to solve the CAPTCHA
7
![Page 25: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/25.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor2. Exploiting bugs in the implementation that allow the attacker to
bypass the CAPTCHA3. Character recognition software to solve the CAPTCHA
7
![Page 26: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/26.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor
2. Exploiting bugs in the implementation that allow the attacker tobypass the CAPTCHA
3. Character recognition software to solve the CAPTCHA
7
![Page 27: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/27.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor2. Exploiting bugs in the implementation that allow the attacker to
bypass the CAPTCHA
3. Character recognition software to solve the CAPTCHA
7
![Page 28: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/28.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor2. Exploiting bugs in the implementation that allow the attacker to
bypass the CAPTCHA3. Character recognition software to solve the CAPTCHA
7
![Page 29: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/29.jpg)
2. Breaking CAPTCHAs
There are 3 common approaches to defeat CAPTCHAs:
1. Using a service which solves CAPTCHAs through human labor2. Exploiting bugs in the implementation that allow the attacker to
bypass the CAPTCHA3. Character recognition software to solve the CAPTCHA
8
![Page 30: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/30.jpg)
2. Breaking CAPTCHAs - Dataset
Testing two common types of CAPTCHA:
Figure 2: CAPTCHAs set 1, generated using PHP
Figure 3: CAPTCHAs set 2, generated with Python
9
![Page 31: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/31.jpg)
2. Breaking CAPTCHAs
Figure 4: Training the neural network
10
![Page 32: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/32.jpg)
2. Breaking CAPTCHAs
Figure 5: Login web page with generated CAPTCHA
11
![Page 33: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/33.jpg)
2. Breaking CAPTCHAs
Figure 6: Workflow of solving CAPTCHA with TensorFlow via Scrapy12
![Page 34: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/34.jpg)
Results
![Page 35: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/35.jpg)
1. Categorizing websites
Figure 7: Percentage of scraping blockade using CAPTCHAs(n = 465 )
13
![Page 36: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/36.jpg)
1. Categorizing websites
Figure 7: Percentage of scraping blockade using CAPTCHAs(n = 465 )
13
![Page 37: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/37.jpg)
1. Categorizing websites
Figure 8: Percentage of scraping blockades using CAPTCHAs(n = 465, n = 55)
14
![Page 38: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/38.jpg)
2. Breaking CAPTCHAs - TensorFlow vs. Tesseract
Figure 9: Success rate of Tesseract and TensorFlow (n = 1,000), higher isbetter
15
![Page 39: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/39.jpg)
2. Breaking CAPTCHAs - TensorFlow vs. Tesseract
Figure 9: Success rate of Tesseract and TensorFlow (n = 1,000), higher isbetter 15
![Page 40: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/40.jpg)
2. Breaking CAPTCHAs - TensorFlow vs. Tesseract
Levenshtein distance: minimal edit distance to get the correct result [5]
E.g. kitten to mitten = 1
Figure 10: Combined Levenshtein distance, lower is better
16
![Page 41: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/41.jpg)
2. Breaking CAPTCHAs - TensorFlow vs. Tesseract
Levenshtein distance: minimal edit distance to get the correct result [5]
E.g. kitten to mitten = 1
Figure 10: Combined Levenshtein distance, lower is better16
![Page 42: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/42.jpg)
Conclusion
![Page 43: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/43.jpg)
Conclusion
• Circumventing CAPTCHAs is necessary to scrape blocked parts ofwebsites
• Machine Learning is most effective• However, if immediacy takes precedent over success rate and
accuracy, then Tesseract (OCR) might be a better option
17
![Page 44: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/44.jpg)
Conclusion
• Circumventing CAPTCHAs is necessary to scrape blocked parts ofwebsites
• Machine Learning is most effective
• However, if immediacy takes precedent over success rate andaccuracy, then Tesseract (OCR) might be a better option
17
![Page 45: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/45.jpg)
Conclusion
• Circumventing CAPTCHAs is necessary to scrape blocked parts ofwebsites
• Machine Learning is most effective• However, if immediacy takes precedent over success rate and
accuracy, then Tesseract (OCR) might be a better option
17
![Page 46: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/46.jpg)
Future Research
![Page 47: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/47.jpg)
Future Research
A more granular analysis of dark web websites:
• What content?• Any content hidden, due to lack of privileges?
18
![Page 48: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/48.jpg)
Future Research
A more granular analysis of dark web websites:
• What content?
• Any content hidden, due to lack of privileges?
18
![Page 49: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/49.jpg)
Future Research
A more granular analysis of dark web websites:
• What content?• Any content hidden, due to lack of privileges?
18
![Page 50: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/50.jpg)
Future Research
Increase readability for Tesseract by ”cleaning up” the image
Figure 11: Removing noise from CAPTCHA [6]
19
![Page 51: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/51.jpg)
Future Research
Achieve a more efficient training model, by using character segmentation
Figure 12: CAPTCHA character segmentation [7]
20
![Page 52: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/52.jpg)
Future Research
Try more CAPTCHAs:
• Increased difficulty• If software to generate the CAPTCHAs, including the answers, is not
available; send a training set to be solved by human labor. Thiscosts money, $ 1,39 per 1,000 images [8]
21
![Page 53: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/53.jpg)
Future Research
Try more CAPTCHAs:
• Increased difficulty
• If software to generate the CAPTCHAs, including the answers, is notavailable; send a training set to be solved by human labor. Thiscosts money, $ 1,39 per 1,000 images [8]
21
![Page 54: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/54.jpg)
Future Research
Try more CAPTCHAs:
• Increased difficulty• If software to generate the CAPTCHAs, including the answers, is not
available; send a training set to be solved by human labor. Thiscosts money, $ 1,39 per 1,000 images [8]
21
![Page 55: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/55.jpg)
Questions
?
22
![Page 56: Breaking CAPTCHAs on the Dark WebCharacter Recognition with Tesseract [2] 3.Torch has previously been used to train a neural network to solve CAPTCHAs by Arun Patala [3] 4 Related](https://reader035.vdocument.in/reader035/viewer/2022070905/5f74bef31c9f703f6435ed81/html5/thumbnails/56.jpg)
References
[1] Lawrence, H., Hughes, A., Tonic, R., & Zou, C. (2017, October).D-miner: A framework for mining, searching, visualizing, and alerting ondarknet events. In Communications and Network Security (CNS), 2017IEEE Conference on (pp. 1-9). IEEE.
[2] Mitchell, R. (2015). Web scraping with Python: collecting data fromthe modern web. ” O’Reilly Media, Inc.”.
[3] Arun Patala. https://deepmlblog.wordpress.com/2016/01/03/how-to-break-a-captcha-system/
[4]people.cs.pitt.edu
[5]extremetech.com
[6]ahm3dibrahim.wordpress.com
[7] medium.com
[8] http://www.deathbycaptcha.com/
23