thwarting emotet email conversation thread …...•scikit-learn python machine learning library...
TRANSCRIPT
Thwarting Emotet email conversation thread hijacking with clusteringOlivier Coutu & Pierre-Luc Vaudry
2
Olivier CoutuOperations Manager
Pierre-Luc VaudryR&D Director
PART 1
A War Story:Email conversation thread hijacking
4
IvanCargo Inc.
JaneShipping Company
5
6
SUSPICIOUS
7
8
IvanCargo Inc.
TomPharma Vision
9
10
SUSPICIOUS
Email conversation thread hijacking: how it works
● Compromised user,
emails siphoned out
● Content of emails replayed
● Attachment name
11
Email thread hijacking: A History
12
2017-05 2018-04 2018-07 2019-04 2019-08
NorthKoreanhackers
First time MO is
observed by ZEROSPAM
Emotetmakes
the news
Emotetis reportedto be using
this MO
We detect ourfirst sizeablewave usingclustering
Erratum regarding Emotet on 2019-08-02
The email conversation thread hijacking campaign samples we had from the
beginning of August turned out, after verification, not to be carrying Emotet. Over
time, we observed various other malware to be using email conversation thread
hijacking, such as Ursnif, QBot, and PredatorTheThief.
13
Suspicious cluster alert!
14
Time To Block
15
NEW
CAMPAIGN
LAUNCH
CLIENT
CONTACTS
US
30
MINUTES
Reducing the Time To Block
16
NEW
CAMPAIGN
LAUNCH
CLIENT
UNAWARE
& SECURE
CLUSTERING REACTION BY
ADJUSTING FILTERING
IN LESS THAN 20 MINUTES
Defending against email thread hijacking
17
Don’t whitelist
your contacts
Quarantine
suspicious files
Enable DMARC
verification
PART 2
Under the hood:Clustering-basedcampaign detection
Why clustering for email filtering?
19
Clustering is the task of grouping objects that are similar (in some sense)
together into groups (called clusters).
An email campaign is a coordinated set of individual email messages that are
deployed across a specific period of time with one specific purpose.
A spam campaign is a particular case of an email campaign where the means
and/or purpose are not legitimate.
Our email clustering implementation
20
Spam cluster: filtering view
21
Spam cluster: clustering view
22
Inside the clustering pipeline
23
Input:
metadata
from last
12 minutes
Linear
combination
of similarity
scores
DBSCAN
clustering
algorithm
Cluster
analysis
Alignment
with
previous
clusters
Output
to GUI
CYCLE EVERY 2-3 MINUTES
Why DBSCAN over other algorithms?
● Input: precomputed
distance matrix
● Outliers: emails not part
of a campaign
● Chains of similarity inside
a cluster
24DBSCAN visualisation © Naftali Harris, 2012-2019
How to evaluate and optimize
25
Adjust similarity
and clustering
parameters
manually
Collect a
metadata
corpus
Cluster with
manually
adjusted
parameters
Correct
clustering
results
Compute
Rand score
Optimize
parameters for
best Rand score
on corrected
clustering
What’s coming?
26
IMPROVE
FILTERING
PUBLISH INDICATORS
OF COMPROMISE
BLOCK ENTIRE
CAMPAIGN
IMPROVE MONITORING OF
CAMPAIGNS OVER TIME
IMPROVE CAMPAIGN
CHARACTERISATION
Clustering: a generic option in your toolset
27
Detect diverse
emerging threats
Applicable to
many data
streams
Quickly
implemented
Acknowledgements
28
Vlad StamateSecurity Analyst
Luca NagyThreat Researcher
References
DBSCAN• https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
• Scikit-learn Python machine learning library
Clustering emails• Qian et al. (2010). A Case for Unsupervised-learning-based Spam Filtering
• Chen et al. (2014). Clustering Spam Campaigns with Fuzzy Hashing
• Sarantopoulos, C. (2015). Identification and on-line incremental clustering of spam campaigns
• Patidar et al. (2016). Activity Detection from Email Meta-Data Clustering
• Smirnov (2018). Clustering and classification methods for spam analysis
Emotet timeline• https://unit42.paloaltonetworks.com/unit42-freemilk-highly-targeted-spear-phishing-campaign/
• https://www.zdnet.com/article/emotet-hijacks-email-conversation-threads-to-insert-links-to-malware/
29
Questions?www.zerospam.ca