spam and the ongoing battle for the inbox

9
COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 25 By Joshua Goodman, Gordon V. Cormack, and David Heckerman Even as spammers and phishers try ever- more sophisticated techniques to get past filters and into users’ mailboxes, anti-spam researchers have managed to stay several steps ahead, so far. S Spam and the Ongoing Battle for the Inbox ince August 1998, when Communications published the article “Spam!” by Lorrie Faith Cranor and Brian A. LaMacchia describing the then rapidly growing onslaught of unwanted email, the amount of all email sent has grown exponentially, but the volume of spam has grown even more. Spam has increased from approximately 10% of overall mail volume in 1998, constitut- ing an annoyance, to as much as 80% today [8], creating an onerous burden on both tens of thousands of email ser- vice providers (ESPs) and tens of millions of end users worldwide. Large email ser- Illustration by Robert Neubecker

Upload: pankaj-khatoniar

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

DESCRIPTION

Spam and the Ongoing Battle for the Inbox

TRANSCRIPT

Page 1: Spam and the Ongoing Battle for the Inbox

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 25

By Joshua Goodman, Gordon V. Cormack, and David Heckerman

Even as spammers and phishers try ever- more sophisticated techniques to get past filters and

into users’ mailboxes, anti-spam researchers have managed to stay several steps ahead, so far.

SSpam and the

Ongoing Battle for the Inbox

ince August 1998, when Communicationspublished the article “Spam!” by LorrieFaith Cranor and Brian A. LaMacchiadescribing the then rapidly growing

onslaught of unwanted email, the amount of allemail sent has grown exponentially, but the volume

of spam has grown even more. Spam hasincreased from approximately 10% ofoverall mail volume in 1998, constitut-ing an annoyance, to as much as 80%today [8], creating an onerous burden

on both tens of thousands of email ser-vice providers (ESPs) and tens of millions

of end users worldwide. Large email ser-

Illustration by Robert Neubecker

Page 2: Spam and the Ongoing Battle for the Inbox

spammer attacks and ploys. Besides getting more data, faster, we also now use

much more sophisticated learning algorithms. Forinstance, algorithms based on logistic regression andthat support vector machines can reduce by half theamount of spam that evades filtering, compared toNaive Bayes. These algorithms “learn” a weight foreach word in a message. The weights are carefullyadjusted so results derived from the training examplesof both spam and good email are as accurate as possi-ble. The learning process may require repeatedlyadjusting tens of thousands or even hundreds of thou-sands of weights, a potentially time-consumingprocess. Fortunately, progress in machine learningover the past few years has made such computationpossible. Sophisticated algorithms (such as SequentialConditional Generalized Iterative Scaling) allow us tolearn a new filter from scratch in about an hour, evenwhen training on more than a million messages.

Spam filtering is not the only beneficiary ofadvances in machine learning; it also drives excitingnew research. For instance, machine learning algo-rithms are typically trained to maximize accuracy(how often their predictions are correct). But in prac-tice, for spam filtering, fraud detection, and manyother problems, these systems are configured veryconservatively. Only if an algorithm is nearly certainthat a message is spam is the message filtered. Thisissue has driven recent research in how to learn specif-ically for these cases. One clever technique that canreduce spam by 20% or more—developed by ScottYih of Microsoft Research—involves training two fil-ters. The first identifies the difficult cases; the secondis trained on only those cases. Focusing on themimproves overall results [12].

Meanwhile, spammers have not been idle asmachine learning has progressed. Traditional machinelearning for spam filtering has many weaknesses. Oneof the most glaring is that the first step in almost anysystem is to break apart a message into its individualwords, then perform an analysis at the word level. Ini-tially, spammers sought to overcome these filters bymaking sure that words with large (spammy) weights,like “free,” did not appear verbatim in their messages.For instance, they might break the word into multiplepieces using an HTML comment (fr<!--><-->ee) or encode it with HTML ASCII codes(fr&#101xe). When displayed to a user, both theseexamples look like “free,” but for spam-filtering soft-ware, especially on servers, any sort of complexHTML processing is too computationally expensive,so the systems do not detect the word “free.”

Spammers do not use these techniques randomly,carefully monitoring what works and what doesn’t.

For instance, in 2003, we saw that the HTML char-acter encoding trick was being used in 5% of spamsent to Hotmail. The trick is easy to detect, however,since it is rare for a normal letter like “e” to be encodedin ASCII in legitimate email. Shortly after we andothers began detecting it, spammers stopped using it;by 2004, it was down to 0% [6]. On the other hand,the token-breaking trick using comments or othertags can be done in many different ways, some diffi-cult to detect; from 2003 to 2004, this exploit wentfrom 7% to 15% of all spam at Hotmail. When weattack the spammers, they actively adapt, droppingthe techniques that don’t work and increasing theiruse of the ones that do.

Scientific evaluation is an essential com-ponent of research; researchers must beable to compare methods using stan-dard data and measurements. This kindof evaluation is particularly difficult for

spam filtering. Due to the sensitivity of email (few ofus would allow our own to be publicly distributed,and those that would are hardly typical), building astandard benchmark for use by researchers is difficult.Within the context of the larger Text REtrieval Con-ference (TREC) effort begun in 1991, a U.S.-govern-ment-supported program (trec.nist.gov) thatfacilitates evaluations, one of us (Cormack) foundedand coordinates a special spam track to evaluate par-ticipants’ filters on real email streams; more impor-tant, it defines standard measures and corpora forfuture tests. It relies on two types of email corpora:

• Synthetic, consisting of a rare public corpus ofgood email, combined with a carefully modifiedset of recent spam. It can be freely shared, andresearchers run their filters on it; and

• Private, whereby researchers submit their code totesters, who run it on the private corpora andreturn summary results only, ensuring privacy.

Results in terms of distinguishing spam from goodemail on the two types are similar, suggesting that, forthe first time, relatively realistic comparisons of differ-ent spam-filtering techniques may be carried out bydifferent groups. Future TREC tracks aim to developeven more realistic evaluation strategies. Meanwhile,the European Conference on Machine Learning(www.ecmlpkdd2006.org) addressed the issue ofspam-filter evaluation through its 2006 DiscoveryChallenge, testing the efficacy of spam filtering with-out user feedback.

One surprising result is that a compression-based

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 2726 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

vices (such as Microsoft’s Hotmail) may be sentmore than a billion spam messages per day. Fortu-nately, however, since 1998, considerable progresshas also been made in stopping spam. Only a smallfraction of that 80% actually reaches end users.Today, essentially all ESPs and most email programsinclude spam filters. From the point of view of theend user, the problem is stabilizing, and spam is formost users today an annoyance rather than a threatto their use of email.

Meanwhile, an ongoing escalation of technology istaking place behind the scenes, with both spammersand spam filter providers using increasingly sophisti-cated solutions. Whenever researchers and developersimprove spam-filtering software, spammers devisemore sophisticated techniques to defeat the filters.Without constant innovation from spam researchers,the deluge of spam being filtered would overwhelmusers’ inboxes.

Spam research is a fascinating topic not only in itsown right but also in how it relates to other fields. Forinstance, most spam filtering programs use at leastone machine learning component. Spam research hasexposed shortcomings in current machine learningtechnology and driven new areas of research. Simi-larly, spam and phishing have made the need to ver-ify senders’ identities on the Internet more importantthan ever and led to new more practical verificationmethods. Spam filtering is an example of adversarialinformation processing; related methods may applynot only to email spam but to many situations inwhich an active opponent attempts to thwart any newdefensive approach.

A round the time spam was becoming amajor problem in 1997, one of us(Heckerman), along with other col-leagues at Microsoft Research, beganwork on machine learning

approaches to spam filtering [11]. In them, computerprograms are provided examples of both spam andgood (non-spam) email (see Figure 1). A learningalgorithm is then used to find the characteristics ofthe spam mail versus those of the good mail. Futuremessages can be automatically categorized as highlylikely to be spam, highly likely to be good, or some-where in between. The earliest learning approacheswere fairly simple, using, say, the Naive Bayes algo-rithm to count how often each word or other featureoccurs in spam messages and in good messages.

To be effective, Naive Bayes and other methodsneed training data—known spam and known goodmail—to train the system. When we first shipped

spam filters, spam was relativelystatic. We had 20 users manuallycollect and hand-label theiremail. We then used this collec-tion to train a filter that was notupdated for many months.Words like “sex,” “free,” and“money” were all good indicatorsof spam that worked for anextended period. As spam filter-

ing became more widely deployed, spammersadapted, quickly learning the most obvious words toavoid and the most innocuous words to add to trickthe filter. It became necessary to gather ever-largeramounts of email (as spammers began using a widervariety of terms), as well as to update the filters fre-quently to keep up with spammers. Today, Hotmailuses a feedback loop system in which more than100,000 volunteers each day are asked to label a mes-sage that was sent to them as either “spam” or “good”email. This regularly provides us new messages totrain our filters, allowing us to react quickly to new

Goodman stick fig (2/07)

ExternalResources

MemoryFilter

Quarantine

Search

MisclassifiedGood Email

GoodEmail

Triage

Read Email

Misclassified Spam

Inbox

Incoming Mail

Figure 1. Spam filtersseparate email into twofolders: the inbox, whichis read regularly, andquarantine, which issearched occasionally.Mistakes—spam in theinbox or good email inquarantine—may bereported to the filter ifnoticed.

Page 3: Spam and the Ongoing Battle for the Inbox

spammer attacks and ploys. Besides getting more data, faster, we also now use

much more sophisticated learning algorithms. Forinstance, algorithms based on logistic regression andthat support vector machines can reduce by half theamount of spam that evades filtering, compared toNaive Bayes. These algorithms “learn” a weight foreach word in a message. The weights are carefullyadjusted so results derived from the training examplesof both spam and good email are as accurate as possi-ble. The learning process may require repeatedlyadjusting tens of thousands or even hundreds of thou-sands of weights, a potentially time-consumingprocess. Fortunately, progress in machine learningover the past few years has made such computationpossible. Sophisticated algorithms (such as SequentialConditional Generalized Iterative Scaling) allow us tolearn a new filter from scratch in about an hour, evenwhen training on more than a million messages.

Spam filtering is not the only beneficiary ofadvances in machine learning; it also drives excitingnew research. For instance, machine learning algo-rithms are typically trained to maximize accuracy(how often their predictions are correct). But in prac-tice, for spam filtering, fraud detection, and manyother problems, these systems are configured veryconservatively. Only if an algorithm is nearly certainthat a message is spam is the message filtered. Thisissue has driven recent research in how to learn specif-ically for these cases. One clever technique that canreduce spam by 20% or more—developed by ScottYih of Microsoft Research—involves training two fil-ters. The first identifies the difficult cases; the secondis trained on only those cases. Focusing on themimproves overall results [12].

Meanwhile, spammers have not been idle asmachine learning has progressed. Traditional machinelearning for spam filtering has many weaknesses. Oneof the most glaring is that the first step in almost anysystem is to break apart a message into its individualwords, then perform an analysis at the word level. Ini-tially, spammers sought to overcome these filters bymaking sure that words with large (spammy) weights,like “free,” did not appear verbatim in their messages.For instance, they might break the word into multiplepieces using an HTML comment (fr<!--><-->ee) or encode it with HTML ASCII codes(fr&#101xe). When displayed to a user, both theseexamples look like “free,” but for spam-filtering soft-ware, especially on servers, any sort of complexHTML processing is too computationally expensive,so the systems do not detect the word “free.”

Spammers do not use these techniques randomly,carefully monitoring what works and what doesn’t.

For instance, in 2003, we saw that the HTML char-acter encoding trick was being used in 5% of spamsent to Hotmail. The trick is easy to detect, however,since it is rare for a normal letter like “e” to be encodedin ASCII in legitimate email. Shortly after we andothers began detecting it, spammers stopped using it;by 2004, it was down to 0% [6]. On the other hand,the token-breaking trick using comments or othertags can be done in many different ways, some diffi-cult to detect; from 2003 to 2004, this exploit wentfrom 7% to 15% of all spam at Hotmail. When weattack the spammers, they actively adapt, droppingthe techniques that don’t work and increasing theiruse of the ones that do.

Scientific evaluation is an essential com-ponent of research; researchers must beable to compare methods using stan-dard data and measurements. This kindof evaluation is particularly difficult for

spam filtering. Due to the sensitivity of email (few ofus would allow our own to be publicly distributed,and those that would are hardly typical), building astandard benchmark for use by researchers is difficult.Within the context of the larger Text REtrieval Con-ference (TREC) effort begun in 1991, a U.S.-govern-ment-supported program (trec.nist.gov) thatfacilitates evaluations, one of us (Cormack) foundedand coordinates a special spam track to evaluate par-ticipants’ filters on real email streams; more impor-tant, it defines standard measures and corpora forfuture tests. It relies on two types of email corpora:

• Synthetic, consisting of a rare public corpus ofgood email, combined with a carefully modifiedset of recent spam. It can be freely shared, andresearchers run their filters on it; and

• Private, whereby researchers submit their code totesters, who run it on the private corpora andreturn summary results only, ensuring privacy.

Results in terms of distinguishing spam from goodemail on the two types are similar, suggesting that, forthe first time, relatively realistic comparisons of differ-ent spam-filtering techniques may be carried out bydifferent groups. Future TREC tracks aim to developeven more realistic evaluation strategies. Meanwhile,the European Conference on Machine Learning(www.ecmlpkdd2006.org) addressed the issue ofspam-filter evaluation through its 2006 DiscoveryChallenge, testing the efficacy of spam filtering with-out user feedback.

One surprising result is that a compression-based

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 2726 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

vices (such as Microsoft’s Hotmail) may be sentmore than a billion spam messages per day. Fortu-nately, however, since 1998, considerable progresshas also been made in stopping spam. Only a smallfraction of that 80% actually reaches end users.Today, essentially all ESPs and most email programsinclude spam filters. From the point of view of theend user, the problem is stabilizing, and spam is formost users today an annoyance rather than a threatto their use of email.

Meanwhile, an ongoing escalation of technology istaking place behind the scenes, with both spammersand spam filter providers using increasingly sophisti-cated solutions. Whenever researchers and developersimprove spam-filtering software, spammers devisemore sophisticated techniques to defeat the filters.Without constant innovation from spam researchers,the deluge of spam being filtered would overwhelmusers’ inboxes.

Spam research is a fascinating topic not only in itsown right but also in how it relates to other fields. Forinstance, most spam filtering programs use at leastone machine learning component. Spam research hasexposed shortcomings in current machine learningtechnology and driven new areas of research. Simi-larly, spam and phishing have made the need to ver-ify senders’ identities on the Internet more importantthan ever and led to new more practical verificationmethods. Spam filtering is an example of adversarialinformation processing; related methods may applynot only to email spam but to many situations inwhich an active opponent attempts to thwart any newdefensive approach.

A round the time spam was becoming amajor problem in 1997, one of us(Heckerman), along with other col-leagues at Microsoft Research, beganwork on machine learning

approaches to spam filtering [11]. In them, computerprograms are provided examples of both spam andgood (non-spam) email (see Figure 1). A learningalgorithm is then used to find the characteristics ofthe spam mail versus those of the good mail. Futuremessages can be automatically categorized as highlylikely to be spam, highly likely to be good, or some-where in between. The earliest learning approacheswere fairly simple, using, say, the Naive Bayes algo-rithm to count how often each word or other featureoccurs in spam messages and in good messages.

To be effective, Naive Bayes and other methodsneed training data—known spam and known goodmail—to train the system. When we first shipped

spam filters, spam was relativelystatic. We had 20 users manuallycollect and hand-label theiremail. We then used this collec-tion to train a filter that was notupdated for many months.Words like “sex,” “free,” and“money” were all good indicatorsof spam that worked for anextended period. As spam filter-

ing became more widely deployed, spammersadapted, quickly learning the most obvious words toavoid and the most innocuous words to add to trickthe filter. It became necessary to gather ever-largeramounts of email (as spammers began using a widervariety of terms), as well as to update the filters fre-quently to keep up with spammers. Today, Hotmailuses a feedback loop system in which more than100,000 volunteers each day are asked to label a mes-sage that was sent to them as either “spam” or “good”email. This regularly provides us new messages totrain our filters, allowing us to react quickly to new

Goodman stick fig (2/07)

ExternalResources

MemoryFilter

Quarantine

Search

MisclassifiedGood Email

GoodEmail

Triage

Read Email

Misclassified Spam

Inbox

Incoming Mail

Figure 1. Spam filtersseparate email into twofolders: the inbox, whichis read regularly, andquarantine, which issearched occasionally.Mistakes—spam in theinbox or good email inquarantine—may bereported to the filter ifnoticed.

Page 4: Spam and the Ongoing Battle for the Inbox

techniques worked well, butwithin a week of deployment,spammers have already adapted.Overall, it is clear that spamchanges quickly, and spammersreact to changes in filtering tech-niques. Less clear is whether spamis getting more difficult over timeor whether spammers are simplyrotating from one technique toanother, without making absoluteprogress.

IP-ADDRESS-BASED TECHNIQUES

It may be that techniques basedon the content of the message aredefeated too easily; there maysimply be too many ways toobfuscate content. Many spam-filter researchers have thusfocused on aspects of spam thatcannot be hidden. The sender ofa message—its IP address—is themost important of them.

The most common method for IP-address filteringis to simply blacklist certain IP addresses. When anaddress is known to send spam, it can just be barredfrom sending any email for a period of time. Thisapproach can be effective, and several groups produceand share lists of bad addresses. This approach alsoinvolves limitations, however. For instance, spammershave become adept at switching IP addresses. Mostblacklists are updated hourly, prompting some spam-mers to acquire huge amounts of bandwidth to allowthem to send tens of millions of messages per IPaddress in the hour or so before the email is blocked;they then switch to another one. Blacklists can alsoresult in false positives (lost good mail) when a goodsender inherits a blacklisted IP address or a single IPaddress is used to send both spam and good email.Blacklists are a powerful tool but no panacea.

Some spammers are extremely clever at trying tocircumvent IP-blocking systems. One common tech-nique is to enlist so-called zombie machines or bot-nets, or computers, typically owned by consumers,that have been infected with viruses or Trojans thatgive spammers full control of the machine. The spam-mers then use them to send spam. Zombies provideinteresting insight into the spam ecosystem. Spam-mers themselves rarely take over machines. Instead,specialists infect machines, then rent them out tospammers. Estimates of the price charged by the spe-cialists for these machines vary, but at least one bot-net operator rented them for $3/computer/month.

Some methods spammers use to obtain IPaddresses are amazingly sophisticated. For instance, inone complex attack, where an ISP blocked outboundtraffic on port 25 (the email port) but not inboundtraffic, spammers were able to perform low-levelTCP/IP protocol hacking to route outbound trafficthrough unblocked machines and inbound packets tothe blocked machines; the result was that the emailappeared to have been sent by the blocked machines.

SECURE IDENTITY

Numerous attempts have sought to introduce cryp-tographically secure identities to email, includingsuch standards as PGP and S/MIME, but none hasbeen widely adopted. Identity is particularly impor-tant in spam filtering. Almost all spam filters havesome form of safe list, allowing users and adminis-trators to identify senders whose email they trust.But without a working identity solution, spammersmight abuse these safe lists by, for instance, sendingemail from someone (such as [email protected])who is commonly safelisted. In addition, in phishingspam—a particularly insidious type of spam—spam-mers impersonate a legitimate business in order tosteal passwords, credit card numbers, Social Securitynumbers, and other sensitive personal information.A working identity solution can have a substantialeffect on these spammers.

Traditional cryptographic approaches to identitysecurity have been robust to most attacks but too dif-ficult to deploy for practical reasons. They typicallyfocus on the identity of a person rather than the iden-tity of an email address, thus requiring a certifyingagency of some sort. Some proposals would require allInternet users to go to their local Post Office and paya fee to get a certificate. In addition, these proposals

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 29

technique is more effectivefor spam filtering than tra-ditional machine learningsystems [1]. Compression-based systems build amodel of spam and amodel of good email. Anew message is com-pressed using both thespam model and thegood-email model. If themessage compresses betterwith the spam model, themessage is likely spam; if itcompresses better with thegood-email model, the message is more likely legiti-mate. While compression-based filtering techniqueshave (in theory) been well understood for years, thisis the first instance we know of in which they beat tra-ditional machine-learning systems. The best compres-sion-oriented results have used Dynamic MarkovCoding; however, better known techniques (such asPrediction by Partial Matching, or PPM) work nearlyas well.

These compression-oriented results open a varietyof avenues for ongoing research. However, we haveyet to understand fully why they work so well forspam filtering. Can they be adapted to work well forother text-classification problems, or is there some-thing unique about spam, and if so, what? One clearadvantage of these techniques is they work evenagainst sophisticated obfuscations. Because they applyto a stream of bits they are inherently insensitive tocharacter encoding and to the makeup of words;hence they are robust to many spammer tricks, likethe HTML obfuscations outlined in Figure 2.

Compression models also pose deployment chal-lenges. First, they can be large, especially whentrained on large amounts of data. Second, they mayimplicitly contain pieces of real email, causing privacyissues. For some kinds of filters (such as personal onesusers build for themselves on their own data), they areextremely promising. For a filter built by, say, a largecompany using many users’ data widely deployed to

other users, these issuesremain to be solved, whilemore conventional learn-ing techniques are stillstate of the art (see Figure3).

One notable challengefor spam-filter research isthat the techniques spam-mers use are different fordifferent recipients; theirability to adapt also dif-fers. For instance, Hot-mail and other large ESPsare subjected to targetedspammer attacks. Spam-mers easily obtainaccounts on these sys-tems, and before sendingin bulk, they might keep

sending test messages over and over until they under-stand how to beat a particular filter. Our filters adaptover time. Some spammers, as they send large quan-tities of spam, monitor whether or not these messagesare being received on their test accounts and maydynamically adapt their techniques. On the otherhand, large ESPs also give some advantage to spamfighters. Hotmail and other large ESPs are able toquickly aggregate information about hundreds of mil-lions of users, rapidly updating their filters as newattacks are detected.

It is difficult to measure how successful spam-mers’ adaptations are in defeating spam fil-ters. In the course of the TREC spam track,we have evaluated new and old machine-learning filters on new and old email, finding

no material difference in filtering performance relatedto the age of the email. Yet we have also seen that fil-ters trained on recent spam perform much better thanthose trained on email even a few weeks old. We havealso found through certain techniques deployed atHotmail that over a broad range of historical data, the

28 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

Goodman graph (2/07) - 19.5 picas width

DMC

0.01

0.10

1.00

10.00

50.000.01 0.10 1.00

False positive rate (%) (logit scale)

% S

pam

Mis

clas

sific

atio

n (lo

git

scal

e)

10.00 50.00

PPMdbacl

BogofilterSpamAssassin

SpamProbeCRM114

SpamBayes

Figure 2. Comparison of compression-based techniquesDMC and PPM in common filtersusing the TREC methodology [1].

Goodman spam (2/07)

From: “Iris Gist” <[email protected]>To: [email protected]: postnata 4344Date: Fri, 26 May 2006 11:21:26 -0700

Hi,=20M e R / D / AV / a G R AP R O z & CA m o x / c i I l / nC i A L / SV A L / u MT r & m a d o IA m B / E NX & n a xL e V / T R AS O m &=20http://www.prosebutis.com <http://www.prosebutis.com>=20

have no more argument. I have chosen Mr. Baggins and that ought to !6te=20enough for all of you. If I say he is a Burglar, a Burglar he is, or=20will be when the time comes. There is a lot more in him than you guess,=20and a deal more than he has any idea of himself. You may (possibly) all=20live to thank me yet. Now Bilbo, my boy, fetch the lamp, and lets have=20little light on this!=20

Figure 3. DMC colorsfragments of the

message: red if they arelikely to be found in

spam, green if they arelikely to be found in

good email.

Spammers themselves rarely take over machines. Instead, SPECIALISTS INFECT MACHINES, then rentthem out to spammers.

Page 5: Spam and the Ongoing Battle for the Inbox

techniques worked well, butwithin a week of deployment,spammers have already adapted.Overall, it is clear that spamchanges quickly, and spammersreact to changes in filtering tech-niques. Less clear is whether spamis getting more difficult over timeor whether spammers are simplyrotating from one technique toanother, without making absoluteprogress.

IP-ADDRESS-BASED TECHNIQUES

It may be that techniques basedon the content of the message aredefeated too easily; there maysimply be too many ways toobfuscate content. Many spam-filter researchers have thusfocused on aspects of spam thatcannot be hidden. The sender ofa message—its IP address—is themost important of them.

The most common method for IP-address filteringis to simply blacklist certain IP addresses. When anaddress is known to send spam, it can just be barredfrom sending any email for a period of time. Thisapproach can be effective, and several groups produceand share lists of bad addresses. This approach alsoinvolves limitations, however. For instance, spammershave become adept at switching IP addresses. Mostblacklists are updated hourly, prompting some spam-mers to acquire huge amounts of bandwidth to allowthem to send tens of millions of messages per IPaddress in the hour or so before the email is blocked;they then switch to another one. Blacklists can alsoresult in false positives (lost good mail) when a goodsender inherits a blacklisted IP address or a single IPaddress is used to send both spam and good email.Blacklists are a powerful tool but no panacea.

Some spammers are extremely clever at trying tocircumvent IP-blocking systems. One common tech-nique is to enlist so-called zombie machines or bot-nets, or computers, typically owned by consumers,that have been infected with viruses or Trojans thatgive spammers full control of the machine. The spam-mers then use them to send spam. Zombies provideinteresting insight into the spam ecosystem. Spam-mers themselves rarely take over machines. Instead,specialists infect machines, then rent them out tospammers. Estimates of the price charged by the spe-cialists for these machines vary, but at least one bot-net operator rented them for $3/computer/month.

Some methods spammers use to obtain IPaddresses are amazingly sophisticated. For instance, inone complex attack, where an ISP blocked outboundtraffic on port 25 (the email port) but not inboundtraffic, spammers were able to perform low-levelTCP/IP protocol hacking to route outbound trafficthrough unblocked machines and inbound packets tothe blocked machines; the result was that the emailappeared to have been sent by the blocked machines.

SECURE IDENTITY

Numerous attempts have sought to introduce cryp-tographically secure identities to email, includingsuch standards as PGP and S/MIME, but none hasbeen widely adopted. Identity is particularly impor-tant in spam filtering. Almost all spam filters havesome form of safe list, allowing users and adminis-trators to identify senders whose email they trust.But without a working identity solution, spammersmight abuse these safe lists by, for instance, sendingemail from someone (such as [email protected])who is commonly safelisted. In addition, in phishingspam—a particularly insidious type of spam—spam-mers impersonate a legitimate business in order tosteal passwords, credit card numbers, Social Securitynumbers, and other sensitive personal information.A working identity solution can have a substantialeffect on these spammers.

Traditional cryptographic approaches to identitysecurity have been robust to most attacks but too dif-ficult to deploy for practical reasons. They typicallyfocus on the identity of a person rather than the iden-tity of an email address, thus requiring a certifyingagency of some sort. Some proposals would require allInternet users to go to their local Post Office and paya fee to get a certificate. In addition, these proposals

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 29

technique is more effectivefor spam filtering than tra-ditional machine learningsystems [1]. Compression-based systems build amodel of spam and amodel of good email. Anew message is com-pressed using both thespam model and thegood-email model. If themessage compresses betterwith the spam model, themessage is likely spam; if itcompresses better with thegood-email model, the message is more likely legiti-mate. While compression-based filtering techniqueshave (in theory) been well understood for years, thisis the first instance we know of in which they beat tra-ditional machine-learning systems. The best compres-sion-oriented results have used Dynamic MarkovCoding; however, better known techniques (such asPrediction by Partial Matching, or PPM) work nearlyas well.

These compression-oriented results open a varietyof avenues for ongoing research. However, we haveyet to understand fully why they work so well forspam filtering. Can they be adapted to work well forother text-classification problems, or is there some-thing unique about spam, and if so, what? One clearadvantage of these techniques is they work evenagainst sophisticated obfuscations. Because they applyto a stream of bits they are inherently insensitive tocharacter encoding and to the makeup of words;hence they are robust to many spammer tricks, likethe HTML obfuscations outlined in Figure 2.

Compression models also pose deployment chal-lenges. First, they can be large, especially whentrained on large amounts of data. Second, they mayimplicitly contain pieces of real email, causing privacyissues. For some kinds of filters (such as personal onesusers build for themselves on their own data), they areextremely promising. For a filter built by, say, a largecompany using many users’ data widely deployed to

other users, these issuesremain to be solved, whilemore conventional learn-ing techniques are stillstate of the art (see Figure3).

One notable challengefor spam-filter research isthat the techniques spam-mers use are different fordifferent recipients; theirability to adapt also dif-fers. For instance, Hot-mail and other large ESPsare subjected to targetedspammer attacks. Spam-mers easily obtainaccounts on these sys-tems, and before sendingin bulk, they might keep

sending test messages over and over until they under-stand how to beat a particular filter. Our filters adaptover time. Some spammers, as they send large quan-tities of spam, monitor whether or not these messagesare being received on their test accounts and maydynamically adapt their techniques. On the otherhand, large ESPs also give some advantage to spamfighters. Hotmail and other large ESPs are able toquickly aggregate information about hundreds of mil-lions of users, rapidly updating their filters as newattacks are detected.

It is difficult to measure how successful spam-mers’ adaptations are in defeating spam fil-ters. In the course of the TREC spam track,we have evaluated new and old machine-learning filters on new and old email, finding

no material difference in filtering performance relatedto the age of the email. Yet we have also seen that fil-ters trained on recent spam perform much better thanthose trained on email even a few weeks old. We havealso found through certain techniques deployed atHotmail that over a broad range of historical data, the

28 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

Goodman graph (2/07) - 19.5 picas width

DMC

0.01

0.10

1.00

10.00

50.000.01 0.10 1.00

False positive rate (%) (logit scale)

% S

pam

Mis

clas

sific

atio

n (lo

git

scal

e)

10.00 50.00

PPMdbacl

BogofilterSpamAssassin

SpamProbeCRM114

SpamBayes

Figure 2. Comparison of compression-based techniquesDMC and PPM in common filtersusing the TREC methodology [1].

Goodman spam (2/07)

From: “Iris Gist” <[email protected]>To: [email protected]: postnata 4344Date: Fri, 26 May 2006 11:21:26 -0700

Hi,=20M e R / D / AV / a G R AP R O z & CA m o x / c i I l / nC i A L / SV A L / u MT r & m a d o IA m B / E NX & n a xL e V / T R AS O m &=20http://www.prosebutis.com <http://www.prosebutis.com>=20

have no more argument. I have chosen Mr. Baggins and that ought to !6te=20enough for all of you. If I say he is a Burglar, a Burglar he is, or=20will be when the time comes. There is a lot more in him than you guess,=20and a deal more than he has any idea of himself. You may (possibly) all=20live to thank me yet. Now Bilbo, my boy, fetch the lamp, and lets have=20little light on this!=20

Figure 3. DMC colorsfragments of the

message: red if they arelikely to be found in

spam, green if they arelikely to be found in

good email.

Spammers themselves rarely take over machines. Instead, SPECIALISTS INFECT MACHINES, then rentthem out to spammers.

Page 6: Spam and the Ongoing Battle for the Inbox

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 31

usually require some form of attachment or inclusionin the email message itself, confusing some users.

In contrast, identity solutions driven by spam havebeen more pragmatic. In particular, Domain KeysIdentified Mail and SenderID have both focused onidentity at the domain level; they make it possible foremail servers to determine whether this email reallycame from this domain. In addition, both DKIM andSenderID have used the existing Domain Name Sys-tem infrastructure to distribute key information.While the DNS infrastructure is far less secure thanare commonly proposed cryptographic solutions, ithas only rarely been compromised in practice. Thispragmatic approach to identity has allowed surpris-ingly quick adoption of these new techniques.Although SenderID was released in 2004, almost40% of non-spam email today is SenderID-compli-ant, thus reducing the opportunities for email spoofing.

Spammers adapted their attack techniques to thistechnology as well. When first released, SenderID wasused more by spammers than by legitimate senders;spammers would create a new domain name, thencreate the relevant records, proving that the email wasnot spoofed. It is important to understand that theseidentity solutions are not aimed at stopping spamdirectly. Rather, they are a key part of a more complexstrategy, aiming to prevent safe-list abuse and phish-ing while allowing the spam filtering component tolearn “good” reputations for legitimate senders.They’ve shown early success in moving toward allthree goals.

OTHER FILTERING TECHNOLOGY

One of the most widely deployed spam filteringtechniques is similarity-matching solutions. Theyattempt to find examples of known spam; for exam-ple, email that has gone to a special trap accountthat should receive no legitimate email that usershave complained about. They then try to match newexamples to this known spam. Spammers activelyrandomize their email in an attempt to defeat thesematching systems. In some cases (such as spam

where the primary content is an image meant todefeat both matching-based and machine-learning-based text-oriented filters), spammers even random-ize the image to defeat image-matchingtechnologies. Promising recent research has focusedon text-oriented matching systems; for instance,work at AOL [7] has used multiple different hashesto make the matching systems more robust to ran-domization; and work at IBM [10] inspired bybioinformatics has sought to find characteristic sub-sequent uses of words that occur in spam but not ingood email.

Similarity-matching systems are generally a goodcomplement to machine-learning-based systems.They help prevent spammers from finding a singlemessage that can beat a learning-based filter, thensend it to hundreds of millions of users. In order todefeat a combined system, spammers must find emailthat beats a machine-learning system, randomizingthe message in such a way that it simultaneouslydefeats a matching-based system.

Image-based spam is one way to attack bothmachine-learning systems and matching systems. Inthis form of spam, the text of a message is random(defeating matching systems) and innocuous (defeat-ing machine-learning systems). Prominently dis-played, perhaps before the text, is an image consistingentirely of an image of text. Optical character recog-nition software is too slow to run on email servers andprobably not accurate enough in an adversarial situa-tion. These images were initially stored mostly on theWeb, using image-source links, rather than embeddedin the message. Because most messages are neveropened, using these links reduces bandwidth cost tospammers, allowing them to spam even more.

Many email providers have responded by blockingmost Web-based images while still allowing embed-ded images stored in the message itself. Spammershave responded by embedding their images in mes-sages. These images were initially identical across mil-lions of messages. Many spam-filter providersresponded by using image-matching algorithms.Spammers countered by randomizing the content of

30 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

HUMAN INTERACTION PROOFS

HIPs (also known as “completely automated public TuringTests to tell computers and humans apart,” or CAPTCHAs, orjust plain Turing Tests) are a key component in preventingabuse. The most common type of HIP is an image of asequence of letters and digits that has been automatically dis-torted. One of the many ways they are used is before signing upfor most free email accounts, users are required to solve one—correctly entering the sequence of letters and numbers in theimage. Without HIPs, spammers would use these services toproduce a torrent of spam (see the other sidebar). They arealso used to prevent automated password attacks. Severalproducts (such as MailBlocks and Matador) have used HIPchallenges for suspected spam as a kind of economicapproach. HIPs also prevent, for instance, the automated har-vesting of Web site data and automated attempts to stealpasswords.

As HIPs have been used more widely and become morecritical in preventing abuse, it has become more important tounderstand just how robust and effective they are. KumarChellapilla, a scientist at Microsoft Live Labs, and his col-leagues there and at Microsoft Research have studied HIPs indetail, finding them to be surprisingly vulnerable to attack. In[2], they reported that nearly all commercially deployed HIPscould be broken with high accuracy. Because the goal of a HIPis to prevent automation, adversaries can accept relatively lowsolution rates; if they fail, they just try again over and over.The most effective HIP tested in [2] could be broken only 5%of the time, while the worst could be solved 67% of the time.

Chellapilla and his colleagues then set out to design betterHIPs. Since the goal of a HIP is to be too difficult for a com-puter (while being readily solvable by a human), they con-ducted both computer studies and human studies. For eachof seven different distortion techniques, they tried various lev-els of distortion, measuring the rate at which the problembecame more difficult for humans to solve compared to therate at which it became more difficult for computers to solve.

An example of their experiments is shown in part (a) andpart (b) of the figure here for the local warp distortion type.For this distortion, at levels of 60 and 80, humans found thetask—decoding/transcribing the HIP—extremely difficult,while computer performance was barely affected. In all sevenof their experiments, computers did as well as humans or bet-ter at decoding/transcribing the HIP as the distortion levelincreased.

This is a very disturbing outcome. If HIPs are automatically solvable by computers, the key barrier against mass automa-tion of many abuses will be gone. There is hope, however. In other research, Chellapilla and his colleagues focused on build-ing segmentation-based HIPs in which the key distortion focused on making it difficult to find word boundaries (see part c ofthe figure). This kind of segmentation distortion appears to be the only problem where computers are still inferior to humans.The human brain does this task effortlessly but has become an algorithmic and computational challenge for computer visionand handwriting recognition. Hopefully, this finger-in-the-dike approach is sufficient for stopping a flood of abuse. c

Goodman sidebar fig 2 (2/07)

100

80

60

40

20

020 40

Amount of Local Warp

P(C

)

60 80

Computer

Human

(a) Local warp distortion at four arbitrary parameter settings. (b) Human vs. computer accuracy at four settings of local warp

distortion. Human accuracy falls more quickly than computer accuracy as the distortion increases. (c) Three samples of a

segmentation-based HIP.

(a)

(b)

(c)

Although SenderID was released in 2004, ALMOST 40% OFNON-SPAM EMAIL TODAY IS SENDERID-COMPLIANT,thus reducing the opportunities for email spoofing.

Page 7: Spam and the Ongoing Battle for the Inbox

bilized at a tolerable level. From the point of view ofspam researchers and developers, it is an ongoingbattle, with both spammers and spam fightersbecoming ever more sophisticated.

A s spam filtering has evolved, so has thecommunity of spam fighters. Oneactive part of that community is theConference on Email and Anti-Spam(www.ceas.cc) begun in 2004. Many

of the methods and results described here were firstpresented at CEAS conferences, focusing not just onspam research but on positive ways to improve emailas well. Studies from the Pew Foundation show emailto be the number-one application on the Internet andclearly deserving of its own research community.CEAS brings together academic researchers, industrialresearchers, developers of email and spam products,and operators of email systems. The practical natureof spam fighting and email development encouragesand requires a degree of collaboration and interactionacross disciplines that is relatively rare in other areas ofcomputer science.

Email spam is by no means the only type of abuseon the Internet. Almost any communication methodinvolves a corresponding form of spam. For example,instant messaging systems are subject to IM Spam(SPIM), and chat rooms are subject to chat spam(SPAT). A key problem for Internet search engines isWeb spam perpetrated by people who try to artifi-cially boost the scores of Web pages to generate traf-fic. Other forms of abuse include click fraud, orpeople clicking on advertisements to steal money orhurt competitors. It turns out that the same techniquecan be used across these different types of spam. Forinstance, IP-address-based analyses can be very help-ful for filtering spam; spammers respond by attempt-ing to acquire a variety of IP addresses cheaply (suchas using zombies and open proxies); countermeasuresfor detecting zombies and open proxies can then beused to identify and stop the spam. Machine learningcan also be applied to these forms of abuse, and work(such as learning optimized for low-false positive rates,originally developed for email spam) may be appliedto these other areas as well.

We hope to solve the problem of email spam,removing the need for endless escalations and tit-for-tat countermeasures. When anti-spoofing technologyis widely deployed, we’ll be able to learn a positive rep-utation for all good senders. Economic approachesmay be applied to the smallest senders, even to thosewho are unknown. Even more sophisticated machine-learning systems may be able to respond to spammers

more quickly and robustly than they can adapt to.Even when that day comes, plenty of interesting andimportant problems will still have to be solved. Spam-mers won’t go away but will move to other applica-tions, keeping us busy for a long time to come.

References1. Bratko, A., Cormack, G., Filipic, B., Lynam, T., and Zupan, B. Spam

filtering using statistical data compression models. Journal of MachineLearning Research 7 (Dec. 2006).

2. Chellapilla, K. and Simard, P. Using machine learning to break visualhuman interaction proofs. In Proceedings of the Advances in NeuralInformation Processing Systems (NIPS) Conference (Vancouver, Canada).MIT Press, 2005, 265–272.

3. Chellapilla, K., Simard, P., and Czerwinski, M. Computers beathumans at single character recognition in reading-based human inter-action proofs (HIPs). In Proceedings of the Second Conference on Emailand Anti-Spam (CEAS) (Palo Alto, CA, July 21–22, 2005).

4. Dwork, C. and Naor, M. Pricing via processing or combatting junkmail. In Proceedings of the 12th Annual International Cryptology Confer-ence (Lecture Notes in Computer Science) (Santa Barbara, CA, Aug.16–20). Springer, 1992, 137–147.

5. Goodman, J. and Rounthwaite, R. Stopping outgoing spam. In Pro-ceedings of the ACM Conference on Electronic Commerce (EC’04) (NewYork, May 17–20). ACM Press, New York, 2004, 30–39.

6. Hulten, G., Penta, A., Seshadrinathan, G., and Mishra, M. Trends inspam products and methods. In Proceedings of the First Conference onEmail and Anti-Spam (CEAS) (Mountain View, CA, July 30–31,2004).

7. Kolcz, A., Chowdhury, A., and Alspector, J. The impact of featureselection on signature-driven spam detection. In Proceedings of the FirstConference on Email and Anti-Spam (CEAS) (Mountain View, CA, July30–31, 2004).

8. Messaging Anti-Abuse Working Group. MAAWG Email Metrics Pro-gram, First Quarter 2006 Report. June 2006;www.maawg.org/about/FINAL_1Q2006_Metrics_Report.pdf.

9. Naor, M. Verification of a Human in the Loop or Identification via theTuring Test; www.wisdom.weizmann.ac.il/~naor/.

10. Rigoutsos, I. and Huynh, T. Chung-Kwei: A pattern-discovery-basedsystem for the automatic identification of unsolicited e-mail messages.In Proceedings of the First Conference on Email and Anti-Spam (CEAS)(Mountain View, CA, July 30–31, 2004).

11. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. A Bayesianapproach to filtering junk e-mail. In Learning for Text Categorization—Papers from the AAAI Workshop. AAAI Technical Report WS-98-05(Madison, WI, 1998).

12. Yih, W., Goodman, J., and Hulten, G. Learning at low false positiverates. In Proceedings of the Third Conference on Email and Anti-Spam(CEAS) (Mountain View, CA, July 27–28, 2006).

Joshua Goodman ([email protected]) is a senior researcherat Microsoft Research, Redmond, WA. Gordon V. Cormack ([email protected]) is a professor inthe David R. Cheriton School of Computer Science at the Universityof Waterloo, Waterloo, ON, Canada. David Heckerman ([email protected]) is a seniorresearcher at Microsoft Research, Redmond, WA.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.

© 2007 ACM 0001-0782/07/0200 $5.00

c

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 33

the images; some even broke the images into multiplepieces to be reassembled only when HTML-basedemail is rendered. These randomized image-basedmessages with innocuous-looking text are especiallydifficult to identify through automated means.

Many payment-based systems have also been pro-posed over the years for spam filtering. Examplesinclude: those that require a time-consuming compu-tation, first suggested in [4]; those that require solvinga human interaction proof (see the sidebar “HumanInteraction Proofs”), first suggested in [9]; and thosethat require making some form of cash micropay-ment, possibly refundable. Unfortunately, these eco-nomic approaches are difficult to deploy in practice.For computational puzzles and cash micropaymentsto succeed, these systems must be widely deployed,and in order to be widely deployed, there must besome expectation of success—a catch-22. In oneexciting development, Microsoft Outlook 2007includes computational puzzles—the first wide-scaledeployment of a computational system to help stopspam. Observing the effectiveness of this approach in

practice will be interesting. For the related problem ofoutbound spam—stopping people spamming from apublic service like Hotmail—economic approacheshave been surprisingly successful (see the sidebar“Outbound Spam”).

Also worth mentioning are legislative attempts tostop spam (such as the 2003 CAN-SPAM Act, alsoknown as the Controlling the Assault of Non-Solicited Pornography and Marketing Act). Unfortu-nately, these legislative approaches have had only alimited effect (see Grimes’s article on page 56). Manyforms of spam can be sent internationally, and cleverspammers are good at making themselves difficult totrace. Many forms of spam are fraudulent or illegal(such as phishing scams and pump-and-dump stockschemes), so additional laws are likely to offer onlyincremental disincentives. Technology will continue tobe the most important mechanism for stopping spam.

CONCLUSION

From the end-user point of view, spam appears to beroughly under control—an annoyance that has sta-

32 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

OUTBOUND SPAM

Even though most research on spam focuses on stopping inbound spam, outbound spam is an important issue as well.Spammers love to send email from free email providers; no one wants to block all the mail from a major service (such asHotmail or Yahoo! Mail). In many ways, outbound spam is a more interesting research topic than its inbound counterpart.Imposing economic costs on senders to prevent inbound spam has proved difficult in practice. For outbound spam, however,the ESP controls the environment and tools and can more readily impose economic costs (such as computational resources,money, and HIPs). For instance, an ESP might provide the tools to solve computational puzzles to all its users. ESPs thatcharge a fee have access to credit-card information, making it possible for them to impose monetary costs. ESPs typically con-trol the user interface, making it easier for them to impose HIP challenges.

Our 2004 theoretical analysis of economic approaches to stopping outbound spam produced some interesting, somewhatsurprising, results [5]. The most interesting was that a technique imposing costs initially (but then stops charging for addi-tional messages) can be as effective as one that charges for every single message. This means that users may at first beannoyed with HIPs, computation, or monetary costs, but after a while, these costs stop. Asymptotically, legitimate users ofthe system pay zero cost per message, but the costs to spammers can be kept high, ideally higher than the benefit they getfrom spamming in the first place. While most inbound spam research is empirical in nature, these are realistic, provablebounds on spammer costs.

One disappointing result was that rate limiting is surprisingly ineffective. When ESPs are notified that they are a majorsource of spam, a natural reaction is for them to impose rate limits on senders. We found that it is important for ESPs toinclude some sort of rate limiting; with no limits, spamming is very cheap. But past a certain monetary point, rate limiting hasalmost no effect. Intuitively, if the rate limit is cut in half, it takes about twice as long to receive enough complaints to termi-nate the account; the same total spam is sent, and the spammer’s cost per message is unchanged. If a spammer wishes tomaintain his sending rate, he can purchase twice as many accounts. His up-front costs double, but the asymptotic costs permessage stay the same.

Fortunately, we also found good ways to increase spammer costs, hopefully above the point at which spamming is cost-effective. Spammer costs are roughly inversely proportional to the number of messages a spammer can send before a com-plaint is received. A consequence is that any method that allows ESPs to more quickly learn about abusive accounts andterminate them will substantially raise the cost to spammers. One such system is the Windows Live Mail Smart Network DataServices (https://postmaster.live.com/snds/index.aspx), a public service provided by Microsoft that allows ISPs to quicklyidentify IP addresses they own that are major sources of spam, helping them quickly take action. c

Page 8: Spam and the Ongoing Battle for the Inbox

bilized at a tolerable level. From the point of view ofspam researchers and developers, it is an ongoingbattle, with both spammers and spam fightersbecoming ever more sophisticated.

A s spam filtering has evolved, so has thecommunity of spam fighters. Oneactive part of that community is theConference on Email and Anti-Spam(www.ceas.cc) begun in 2004. Many

of the methods and results described here were firstpresented at CEAS conferences, focusing not just onspam research but on positive ways to improve emailas well. Studies from the Pew Foundation show emailto be the number-one application on the Internet andclearly deserving of its own research community.CEAS brings together academic researchers, industrialresearchers, developers of email and spam products,and operators of email systems. The practical natureof spam fighting and email development encouragesand requires a degree of collaboration and interactionacross disciplines that is relatively rare in other areas ofcomputer science.

Email spam is by no means the only type of abuseon the Internet. Almost any communication methodinvolves a corresponding form of spam. For example,instant messaging systems are subject to IM Spam(SPIM), and chat rooms are subject to chat spam(SPAT). A key problem for Internet search engines isWeb spam perpetrated by people who try to artifi-cially boost the scores of Web pages to generate traf-fic. Other forms of abuse include click fraud, orpeople clicking on advertisements to steal money orhurt competitors. It turns out that the same techniquecan be used across these different types of spam. Forinstance, IP-address-based analyses can be very help-ful for filtering spam; spammers respond by attempt-ing to acquire a variety of IP addresses cheaply (suchas using zombies and open proxies); countermeasuresfor detecting zombies and open proxies can then beused to identify and stop the spam. Machine learningcan also be applied to these forms of abuse, and work(such as learning optimized for low-false positive rates,originally developed for email spam) may be appliedto these other areas as well.

We hope to solve the problem of email spam,removing the need for endless escalations and tit-for-tat countermeasures. When anti-spoofing technologyis widely deployed, we’ll be able to learn a positive rep-utation for all good senders. Economic approachesmay be applied to the smallest senders, even to thosewho are unknown. Even more sophisticated machine-learning systems may be able to respond to spammers

more quickly and robustly than they can adapt to.Even when that day comes, plenty of interesting andimportant problems will still have to be solved. Spam-mers won’t go away but will move to other applica-tions, keeping us busy for a long time to come.

References1. Bratko, A., Cormack, G., Filipic, B., Lynam, T., and Zupan, B. Spam

filtering using statistical data compression models. Journal of MachineLearning Research 7 (Dec. 2006).

2. Chellapilla, K. and Simard, P. Using machine learning to break visualhuman interaction proofs. In Proceedings of the Advances in NeuralInformation Processing Systems (NIPS) Conference (Vancouver, Canada).MIT Press, 2005, 265–272.

3. Chellapilla, K., Simard, P., and Czerwinski, M. Computers beathumans at single character recognition in reading-based human inter-action proofs (HIPs). In Proceedings of the Second Conference on Emailand Anti-Spam (CEAS) (Palo Alto, CA, July 21–22, 2005).

4. Dwork, C. and Naor, M. Pricing via processing or combatting junkmail. In Proceedings of the 12th Annual International Cryptology Confer-ence (Lecture Notes in Computer Science) (Santa Barbara, CA, Aug.16–20). Springer, 1992, 137–147.

5. Goodman, J. and Rounthwaite, R. Stopping outgoing spam. In Pro-ceedings of the ACM Conference on Electronic Commerce (EC’04) (NewYork, May 17–20). ACM Press, New York, 2004, 30–39.

6. Hulten, G., Penta, A., Seshadrinathan, G., and Mishra, M. Trends inspam products and methods. In Proceedings of the First Conference onEmail and Anti-Spam (CEAS) (Mountain View, CA, July 30–31,2004).

7. Kolcz, A., Chowdhury, A., and Alspector, J. The impact of featureselection on signature-driven spam detection. In Proceedings of the FirstConference on Email and Anti-Spam (CEAS) (Mountain View, CA, July30–31, 2004).

8. Messaging Anti-Abuse Working Group. MAAWG Email Metrics Pro-gram, First Quarter 2006 Report. June 2006;www.maawg.org/about/FINAL_1Q2006_Metrics_Report.pdf.

9. Naor, M. Verification of a Human in the Loop or Identification via theTuring Test; www.wisdom.weizmann.ac.il/~naor/.

10. Rigoutsos, I. and Huynh, T. Chung-Kwei: A pattern-discovery-basedsystem for the automatic identification of unsolicited e-mail messages.In Proceedings of the First Conference on Email and Anti-Spam (CEAS)(Mountain View, CA, July 30–31, 2004).

11. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. A Bayesianapproach to filtering junk e-mail. In Learning for Text Categorization—Papers from the AAAI Workshop. AAAI Technical Report WS-98-05(Madison, WI, 1998).

12. Yih, W., Goodman, J., and Hulten, G. Learning at low false positiverates. In Proceedings of the Third Conference on Email and Anti-Spam(CEAS) (Mountain View, CA, July 27–28, 2006).

Joshua Goodman ([email protected]) is a senior researcherat Microsoft Research, Redmond, WA. Gordon V. Cormack ([email protected]) is a professor inthe David R. Cheriton School of Computer Science at the Universityof Waterloo, Waterloo, ON, Canada. David Heckerman ([email protected]) is a seniorresearcher at Microsoft Research, Redmond, WA.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.

© 2007 ACM 0001-0782/07/0200 $5.00

c

COMMUNICATIONS OF THE ACM February 2007/Vol. 50, No. 2 33

the images; some even broke the images into multiplepieces to be reassembled only when HTML-basedemail is rendered. These randomized image-basedmessages with innocuous-looking text are especiallydifficult to identify through automated means.

Many payment-based systems have also been pro-posed over the years for spam filtering. Examplesinclude: those that require a time-consuming compu-tation, first suggested in [4]; those that require solvinga human interaction proof (see the sidebar “HumanInteraction Proofs”), first suggested in [9]; and thosethat require making some form of cash micropay-ment, possibly refundable. Unfortunately, these eco-nomic approaches are difficult to deploy in practice.For computational puzzles and cash micropaymentsto succeed, these systems must be widely deployed,and in order to be widely deployed, there must besome expectation of success—a catch-22. In oneexciting development, Microsoft Outlook 2007includes computational puzzles—the first wide-scaledeployment of a computational system to help stopspam. Observing the effectiveness of this approach in

practice will be interesting. For the related problem ofoutbound spam—stopping people spamming from apublic service like Hotmail—economic approacheshave been surprisingly successful (see the sidebar“Outbound Spam”).

Also worth mentioning are legislative attempts tostop spam (such as the 2003 CAN-SPAM Act, alsoknown as the Controlling the Assault of Non-Solicited Pornography and Marketing Act). Unfortu-nately, these legislative approaches have had only alimited effect (see Grimes’s article on page 56). Manyforms of spam can be sent internationally, and cleverspammers are good at making themselves difficult totrace. Many forms of spam are fraudulent or illegal(such as phishing scams and pump-and-dump stockschemes), so additional laws are likely to offer onlyincremental disincentives. Technology will continue tobe the most important mechanism for stopping spam.

CONCLUSION

From the end-user point of view, spam appears to beroughly under control—an annoyance that has sta-

32 February 2007/Vol. 50, No. 2 COMMUNICATIONS OF THE ACM

OUTBOUND SPAM

Even though most research on spam focuses on stopping inbound spam, outbound spam is an important issue as well.Spammers love to send email from free email providers; no one wants to block all the mail from a major service (such asHotmail or Yahoo! Mail). In many ways, outbound spam is a more interesting research topic than its inbound counterpart.Imposing economic costs on senders to prevent inbound spam has proved difficult in practice. For outbound spam, however,the ESP controls the environment and tools and can more readily impose economic costs (such as computational resources,money, and HIPs). For instance, an ESP might provide the tools to solve computational puzzles to all its users. ESPs thatcharge a fee have access to credit-card information, making it possible for them to impose monetary costs. ESPs typically con-trol the user interface, making it easier for them to impose HIP challenges.

Our 2004 theoretical analysis of economic approaches to stopping outbound spam produced some interesting, somewhatsurprising, results [5]. The most interesting was that a technique imposing costs initially (but then stops charging for addi-tional messages) can be as effective as one that charges for every single message. This means that users may at first beannoyed with HIPs, computation, or monetary costs, but after a while, these costs stop. Asymptotically, legitimate users ofthe system pay zero cost per message, but the costs to spammers can be kept high, ideally higher than the benefit they getfrom spamming in the first place. While most inbound spam research is empirical in nature, these are realistic, provablebounds on spammer costs.

One disappointing result was that rate limiting is surprisingly ineffective. When ESPs are notified that they are a majorsource of spam, a natural reaction is for them to impose rate limits on senders. We found that it is important for ESPs toinclude some sort of rate limiting; with no limits, spamming is very cheap. But past a certain monetary point, rate limiting hasalmost no effect. Intuitively, if the rate limit is cut in half, it takes about twice as long to receive enough complaints to termi-nate the account; the same total spam is sent, and the spammer’s cost per message is unchanged. If a spammer wishes tomaintain his sending rate, he can purchase twice as many accounts. His up-front costs double, but the asymptotic costs permessage stay the same.

Fortunately, we also found good ways to increase spammer costs, hopefully above the point at which spamming is cost-effective. Spammer costs are roughly inversely proportional to the number of messages a spammer can send before a com-plaint is received. A consequence is that any method that allows ESPs to more quickly learn about abusive accounts andterminate them will substantially raise the cost to spammers. One such system is the Windows Live Mail Smart Network DataServices (https://postmaster.live.com/snds/index.aspx), a public service provided by Microsoft that allows ISPs to quicklyidentify IP addresses they own that are major sources of spam, helping them quickly take action. c

Page 9: Spam and the Ongoing Battle for the Inbox