june 2013 univ. of alabama @ birmingham1 research of alan sprague: using data mining to combat spam,...
TRANSCRIPT
![Page 1: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/1.jpg)
Univ. of Alabama @ Birmingham 1June 2013
Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware
Department of Computer and Information Sciences
University of Alabama at Birmingham
![Page 2: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/2.jpg)
Univ. of Alabama @ Birmingham 2
We offer BS and MS degrees with an emphasis on forensics; the Criminal Justice Department participates in these programs.
Research center: CIA/JFR: http://thecenter.uab.edu
Gary Warner Blog “Cyber Crime and Doing Time”
http://garwarner.blogspot.com My research
Spam Phishing Malware
June 2013
Computer Forensics at UAB
![Page 3: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/3.jpg)
Univ. of Alabama @ Birmingham 3June 2013
Outline
This presentation will describe my research interests in spam and malware.
The next 9 slides: spam. Subsequent slides: malware.
![Page 4: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/4.jpg)
Univ. of Alabama @ Birmingham 4June 2013
Spam and the criminal web
70-80% of all email in the world is spam.Spam enables various classes of antisocial
activity:Spam advertises opportunities to buy counterfeit goods, for example, pills (possibly adulterated pills)Spam delivers phish, which commonly are intended to steal credentials to banks and other financial institutions.Spam delivers malware.
![Page 5: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/5.jpg)
Univ. of Alabama @ Birmingham 5
People commonly expect our research to be classification of emails as ham or spam: desired or undesired. They then expect us to help filter email, so that spam will not be delivered.
That is not our research. Instead, we start with a data file that we expect is entirely spam, and our goal is to cluster it into spam campaigns.
This is an important goal, because after we understand the various spam campaigns, we know which are the largest, and we know what type of criminal activity each campaign enables. This enabled law enforcement to focus attention on the most harmful campaigns.
June 2013
Spam: Clustering, not Classification
![Page 6: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/6.jpg)
Univ. of Alabama @ Birmingham 6
Background on Data Mining
Data Mining studies the challenges and opportunities offered by huge data files.
Three methods are central to Data Mining. Clustering: group together records in the
data file if they resemble each other (without knowing the “meaning” of any resulting group, called a cluster).
Classification: assign each record to one of several “classes”, each of which corresponds to a known type of data.
Frequent sets and association rules
June 2013
![Page 7: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/7.jpg)
Univ. of Alabama @ Birmingham 7
Our spam data
Each day: 1 million spam messages Stored into UAB Spam Data Mine
June 2013
![Page 8: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/8.jpg)
Univ. of Alabama @ Birmingham 8June 2013
Preprocessing of spam data
Parsing Subject Sender IP Sendername If body contains a URL:
Its domain, and IP Word count of body
![Page 9: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/9.jpg)
Univ. of Alabama @ Birmingham 9
Some spams, parsed
Subject Sender Sender Name Username
Order HCG online y5fh6 EfrenGriffith artq.com
Order HCG online vfe3ih Victor musicradio.com
Pfizer Inc Discount 43681 lefley uab.edu
Buy Cialis Online Tam Smith adeptis.com
Your LinkedIn blocked John Fial irs.gov
June 2013
![Page 10: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/10.jpg)
Univ. of Alabama @ Birmingham 10June 2013
Goal, for the Spam Data Mine
Cluster each day’s emails, to find largest spam campaigns, and then to find clues: where are they coming from?
Relate each day’s clusters to the previous day’s clusters. Any new types of spam are considered “emerging threats”.
![Page 11: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/11.jpg)
Univ. of Alabama @ Birmingham 11June 2013
Largest Cluster on a particular day
agethough.com
numbertook.com
rolloccur.com
sincejust.com
xtpnttm.cn
vlxejzg.cn
110.52.8.253124.42.91.162
91.213.33.10203.93.208.86218.75.144.6
220.196.59.35
60.191.239.15088.80.16.161
aoibejp.cn159.226.7.162
curbdta.cn
Ihusepod.cn
tyinoriv.cn
IP addresses
Subgroup 3
Subgroup 1
Subgroup 2
Domain namesEmail screenshot
![Page 12: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/12.jpg)
Univ. of Alabama @ Birmingham 12June 2013
Why Is This Work Useful?
A large number of domains used by leading spammers to counter domain blacklisting
Shutdown of those domains and their hosting servers can greatly cripple spammers’ ability to conduct spam-related cyber crimes.
Further investigation of domains and IP addresses may lead to the identities of spammers.
![Page 13: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/13.jpg)
Univ. of Alabama @ Birmingham 13June 2013
Transition
Spam clustering is an ongoing project. A different thrust is the study of malware. I describe two methods of static analysis of malware: using blocks and jumps (slide 16), and using strings (slides 17-23).
![Page 14: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/14.jpg)
Univ. of Alabama @ Birmingham 14June 2013
Malware
What is malware? A program that performs actions that the
user does not want Executable file, i.e., machine code
Each day, we add 5000 new malwares to our database
Two types of analysis: Static analysis Dynamic analysis
![Page 15: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/15.jpg)
Univ. of Alabama @ Birmingham 15June 2013
Goals
Malwares belong to families, such as Zeus, Reveton, Perfect keylogger
Eventual goal: Put each malware into its family.
Current goal: Cluster malwares, based on their strings.
![Page 16: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/16.jpg)
Univ. of Alabama @ Birmingham 16
Static Analysis, using Blocks and Jumps
Method to encode malwares: Jumps (e.g. subroutines, and subroutine
calls) Disassemble each malware, split it into
“blocks”, compute a hash value for each block. Also find each jump, and write which block it is from and which it is to.
Result: each malware is a directed graph. When malwares are encoded this way,
malwares will be clustered together if their graphs are similar.
July 2013
![Page 17: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/17.jpg)
Univ. of Alabama @ Birmingham 17
Static Analysis, using strings of printable characters at least 4 characters long, ending with \0
cxczxczxczxcc
Enter
%d-%02d-%02d_%02d-%02d-%02d-%d
JPEG Image saved successfully!^
Screenshot saving cancelled because of logging disabled.^
COXJPEGFile::fill_input_buffer : Catching CFileException^
%d-%d-%d_%d-%d-%d
_controlfp
1.12782
@.rsrc
Password:
June 2013
![Page 18: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/18.jpg)
Univ. of Alabama @ Birmingham 18June 2013
Data File for 1 Day
Each row is the list of strings in one malware.A sample file of 5000 malwares looks like: m1: cxczxczxczxcc, Enter, _controlfp, ….
m2: ……………. m3: ……………. m4: ……………. . . . m5000: ………….
![Page 19: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/19.jpg)
Univ. of Alabama @ Birmingham 19
Frequent sets
A typical application is retail data. Data File: Purchases at a large store. Each record: List of purchases of one customer. Question: Which items are often bought together?
Our application: malware. Our data file: Strings in malwares. Each record: List of strings of one malware. Question: Which strings are often found together? Dual Question: which malwares have many
common strings?
June 2013
![Page 20: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/20.jpg)
Univ. of Alabama @ Birmingham 20
Frequent sets: Tiny example
6 malwares (so 6 records), 4 strings.
The malwares: a, b, c, d b, c, d a, c, d a, b c, d b, d
July 2013
Incidence matrix a b c d 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1
![Page 21: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/21.jpg)
Univ. of Alabama @ Birmingham 21
Frequent sets: Tiny example
Strings a,c are a frequent set (records r1 and r3 contain both)
But a,c is not maximal, because d is in both records
Incidence matrix
a b c d r1 *1 1 *1 *1 r2 0 1 1 1 r3 *1 0 *1 *1 r4 1 1 0 0 r5 0 0 1 1 r6 0 1 0 1
![Page 22: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/22.jpg)
Univ. of Alabama @ Birmingham 22
Closed frequent sets
A frequent set is closed if it equals the intersection of the records containing it.
Alternate definition: a closed set is a maximal all-ones submatrix.
Since rows and columns play the same role in this, one can let malwares and strings exchange roles.
Ex: Incidence matrix
a b c d r1 *1 1 *1 *1 r2 0 1 1 1 r3 *1 0 *1 *1 r4 1 1 0 0 r5 0 0 1 1 r6 0 1 0 1
July 2013
![Page 23: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/23.jpg)
Univ. of Alabama @ Birmingham 23
Closed Frequent Sets for Malware Analysis Wanted closed frequent sets, with
threshold 30. The lowest the state-of-the-art
algorithm could do was 1000. By being willing to discard strings that
appear more than 10 times, we recently managed threshold 20.
Ongoing
June 2013
![Page 24: June 2013 Univ. of Alabama @ Birmingham1 Research of Alan Sprague: Using Data Mining to Combat Spam, Phishing, and Malware Department of Computer and Information](https://reader030.vdocument.in/reader030/viewer/2022032802/56649e125503460f94afe382/html5/thumbnails/24.jpg)
Univ. of Alabama @ Birmingham 24
The end
.
July 2008