email address harvesting

58
Produced in cooperation with: HP Technology Forum & Expo 2009 © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Email Address Harvesting Michael Lamont Senior Software Engineer June 17, 2009

Upload: michael-lamont

Post on 08-Sep-2014

90 views

Category:

Technology


3 download

DESCRIPTION

HP Tech Forum 2009 presentation covering some of the ways spammers harvest email addresses on the Internet (and how you can prevent it), including an in-depth look at three commonly used software packages.

TRANSCRIPT

Page 1: Email Address Harvesting

Produced in cooperation with: HP Technology Forum & Expo 2009

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

Email Address Harvesting Michael Lamont

Senior Software Engineer

June 17, 2009

Page 2: Email Address Harvesting

Overview

• What is email address harvesting?

• How do spammers do it?

• What can you do about it?

• Examples of harvesting software

Page 3: Email Address Harvesting

Mandatory Definition Slide

• Email address harvesting is the process used by spammers to extract email addresses from public sources.

• Common sources:

− Web sites

− Newsgroups

− Mailing lists

− Chat rooms

Page 4: Email Address Harvesting

Mandatory “How Bad Is It?” Slide

• FTC: 86% of all email addresses posted on web pages receive spam.

• FTC: 93% of all email addresses used in newsgroups receive spam.

• PSC honeypot record: Address received spam 4 minutes after being included in a newsgroup post.

Page 5: Email Address Harvesting

Address Lists

• Spammers use address harvesting to build giant lists of addresses to send spam to.

• Most lists have 1-20 million addresses.

• Spammers sell/share their lists, so being on even just one list will get you a lot of spam.

Page 6: Email Address Harvesting

Evolution Of The Address List

• Somebody (probably not even a spammer) harvests addresses from various sources.

• A “good” harvester scrubs the list.

• The harvester sells the list to lots of spammers.

• Once your address is on a list, it’s going to be on one or more lists forever.

Page 7: Email Address Harvesting

Harvesting From Web Sites

• Spammers usually use a spider program to scrape addresses off of web pages.

Page 8: Email Address Harvesting

Harvesting From Web Sites

Page 9: Email Address Harvesting

Harvesting From Web Sites

• Web directories make it easy to get lots of addresses

Page 10: Email Address Harvesting

Harvesting From Web Sites

10 22 July 2014

Page 11: Email Address Harvesting

UseNet Newsgroups

• Spider programs exist to extract these addresses as well.

• Email addresses are splattered all over:

− Message headers

− Signatures

− Attributions

Page 12: Email Address Harvesting

Mailing Lists

• Lots of list manager software provides a list of every email address on a list.

• Spammers are happy to join a mailing list temporarily to get access to a list of subscribers.

• Some clever spammers send an innocuous newbie question from the list archives with a read-receipt request.

Page 13: Email Address Harvesting

3rd Party Mailing Lists

• People you’ve provided your address to provide it to 3rd parties (usually for profit).

• Example: Auto insurance quote

• Initial sale of list might be aboveboard, but lists have a way of trickling down to less desirable senders.

Page 14: Email Address Harvesting

Web Browser Holes

• Newer browsers have eliminated most of these, but they’re still common in older browsers.

• Extraction of email address from HTTP_FROM header that browser sends to web server.

• JavaScript to extract email address from browser’s configuration.

Page 15: Email Address Harvesting

Web Browser Holes

• Force browser to fetch an image on a page by anonymous FTP.

− Most browsers use the configured email address as the password.

• JavaScript action that sends an email message in the background on page load.

Page 16: Email Address Harvesting

Chat Rooms

• Web bots monitor chat rooms and extract user names.

• Lots of providers (AOL, Yahoo) use the same profile names for both chat rooms and email.

• IRC used to be fertile harvesting ground, but it’s fallen into disuse by less savvy users.

Page 17: Email Address Harvesting

Domain Contacts

• Every registered domain name has one or more contact addresses.

• Addresses are publicly accessible (WHOIS)

• Addresses are almost always valid and read by a real person on a regular basis.

Page 18: Email Address Harvesting

Guessing

• Spammers “guess together” a list of email addresses.

• The addresses are tested against one or more email servers.

• Valid addresses are added to a list of addresses to be spammed.

• Usually referred to as directory harvesting.

Page 19: Email Address Harvesting

CAN-SPAM

• Federal CAN-SPAM act explicitly makes email address harvesting illegal.

• Some providers of the harvesting software have ceased and desisted, but harvesting has actually increased.

• Like most legal solutions, CAN-SPAM is severely constrained by jurisdictional boundaries.

Page 20: Email Address Harvesting

Harvesting Prevention

• The harder it is for spammers to get your address, the harder it is for them to spam you.

• “I don’t care – my spam filter is awesome. Bring it on!”

• No filter is 100% accurate

• Filtering still places load on filtering system and/or email server.

Page 21: Email Address Harvesting

Prevention Methods

• Reformatting addresses

• Web forms

• JavaScript-generated mailto links

• Graphical addresses

• Throwaway addresses

Page 22: Email Address Harvesting

Reformatting Addresses

• Prevents harvesting from web pages and newsgroups.

• Simple examples include inserting bogus strings into the address to make it invalid:

[email protected]

[email protected]

Page 23: Email Address Harvesting

Reformatting Addresses

• Writing the address out longhand can prevent harvesters from recognizing it as an email address:

jdoe at hp dot com

• Inserting extra whitespace can also help:

jdoe @ hp.com

jdoe @ hp.com

Page 24: Email Address Harvesting

Reformatting Addresses

• ASCII-encoded characters in the address are decoded by most web clients, but not by most spamware:

jdoe@p&#

114;ocess&#

046;com

Page 25: Email Address Harvesting

Web Forms

• Provide an HTML form for web site visitors to enter a message.

• When the form is submitted, the CGI script mails the message to the appropriate recipient.

• Avoids displaying the actual address anywhere on the site.

• Can still be abused, but it’s relatively difficult to do.

Page 26: Email Address Harvesting

Web Forms

Page 27: Email Address Harvesting

JavaScript Generated mailtos

• Use JavaScript to dynamically generate mailto: link when the link is clicked.

<A HREF=„javascript:window.location=

“mail”+”to:”+”jdoe”+”@”+”hp”+”.”+”com”; return

true‟>Click here to mail John Doe</A>

Page 28: Email Address Harvesting

Graphical Addresses

• Displaying all or part of an email address as a graphical image will throw off most harvesting software.

• No known harvesting software is OCR-capable.

− Anecdotal reports of at least one large spam organization trying to develop accurate OCR harvesters

Page 29: Email Address Harvesting

Graphical Address Complexity

• Graphical @ sign:

− Probably sufficient to throw off most harvesters.

− Username and hostname are still in close proximity.

− Works easily for multiple users/multiple domains.

jdoe hp.com

Page 30: Email Address Harvesting

Graphical Address Complexity

• Graphical @hostname:

− Should prevent any harvester from working.

− Requires a different image for each email domain.

jdoe

Page 31: Email Address Harvesting

Graphical Address Complexity

• Graphical everything:

− For the truly paranoid.

− Completely unreadable by harvesters unless they’re OCR-enabled.

− Requires either a lot of images or a script that can dynamically generate them.

Page 32: Email Address Harvesting

Throwaway Addresses

• Many people create an email account that they use only for web pages and newsgroups.

• Some software products go further and let you create an alias for every occasion.

• You still need a static address for business cards, resumes, etc.

Page 33: Email Address Harvesting

Harvesting Software

• Tons of specialized software (spamware) used by spammers to harvest addresses.

• Most spamware developed in Eastern Europe and Asia.

• We’re going to look at several of the most popular packages.

Page 34: Email Address Harvesting

List Harvester

• Harvests addresses from web sites.

• “Targeted” harvesting - in theory, the harvested email addresses have something in common.

• Appears to be based in China.

• http://www.listharvester.com

• Price: $699 US

Page 35: Email Address Harvesting

List Harvester - Method

• Performs a search for one or more keywords on the user’s choice of search engine.

• Parses every site returned by the search engine in order, looking for addresses and links.

• Follows links to other pages and parses them for addresses as well.

Page 36: Email Address Harvesting

List Harvester

• Start screen:

Page 37: Email Address Harvesting

List Harvester

• Search terms entry:

Page 38: Email Address Harvesting

List Harvester

• Search parameters:

Page 39: Email Address Harvesting

List Harvester

• Search filters:

Page 40: Email Address Harvesting

List Harvester

• Parsing engine options:

Page 41: Email Address Harvesting

List Harvester

• Saving list of extracted addresses:

Page 42: Email Address Harvesting

List Harvester

• Harvesting in progress:

Page 43: Email Address Harvesting

Atomic Email Hunter

• Harvests addresses from web sites.

• Either scans an entire web site for addresses or performs a “targeted search” like List Harvester.

• Based in Russia, most likely Moscow.

• http://www.massmailsoftware.com/

• Price: $79.85 US

Page 44: Email Address Harvesting

Atomic Email Hunter

• Start screen:

Page 45: Email Address Harvesting

Atomic Email Hunter

• Web download settings:

Page 46: Email Address Harvesting

Atomic Email Hunter

• Address filtering settings:

Page 47: Email Address Harvesting

Atomic Email Hunter

Run:

Page 48: Email Address Harvesting

Atomic Email Hunter

• Results:

Page 49: Email Address Harvesting

Fast Newsgroups Extractor

• Harvests addresses from newsgroups.

• Has a companion web site extractor that’s very similar to Atomic Email Hunter.

• Based in Russia, most likely Moscow.

• http://www.lencom.com

• Price: $79.00 US

Page 50: Email Address Harvesting

Fast Newsgroups Extractor - Method

• Lets user select one or more newsgroups to extract content from.

• Downloads multiple messages simultaneously from the NNTP server.

• Extracts addresses from the downloaded messages.

• Has the ability to limit downloaded messages to those that contain certain text in the subject.

Page 51: Email Address Harvesting

Fast Newsgroups Extractor

• Start screen:

Page 52: Email Address Harvesting

Fast Newsgroups Extractor

• News server setup:

Page 53: Email Address Harvesting

Fast Newsgroups Extractor

• Newsgroup list download:

Page 54: Email Address Harvesting

Fast Newsgroups Extractor

• News group selection:

Page 55: Email Address Harvesting

Fast Newsgroups Extractor

• Harvesting job setup

Page 56: Email Address Harvesting

Fast Newsgroups Extractor

• Run:

Page 57: Email Address Harvesting

Quick Review

• We talked about:

− What email address harvesting is

− What data sources are harvested

− How you can protect your addresses

− 3 software packages used by spammers to harvest addresses

Page 58: Email Address Harvesting

58 22 July 2014