temporal analytics for predictive cyber threat intelligence · 4. predictive analytics our data can...

2
Temporal Analytics for Predictive Cyber Threat Intelligence [Tool Presentation Abstract] Staffan Truvé Recorded Future [email protected] ABSTRACT Recorded Future has developed its Temporal Analytics En- gine as a general purpose platform for harvesting and ana- lyzing unstructured text from the open, deep, and dark web, and for transforming that content into a structured repre- sentation suitable for different analyses. In this paper we present some of the key components of our system, and show how it has been adapted to the in- creasingly important domain of cyber threat intelligence. We also describe how our data can be used for predictive analytics, e.g. to predict the likelihood of a product vulner- ability being exploited or to assess the maliciousness of an IP address. 1. TEMPORAL ANALYTICS Temporal Analytics is by Recorded Future’s definition the entire process of going from unstructured text describing events in the world, via a structured representation of world events, to analyses that describe the current state of world affairs or predict future events. Our structured represen- tation is based on entities (persons, places, organizations, technologies, etc.) and events that relate entities to an event type and a set of metadata such as the the source of an event mention, its publishing time, and a derived event time. Analysis of this data can range from finding individ- ual event mentions to quantitative analysis of the number of mentions, publishing and event time patterns, and machine learning based predictive models. Our Temporal Analytics Engine today harvests and analyses millions of documents – ranging from short tweets to long articles and reports – every day. The coverage is very broad, from government and big media sites to blogs, forums, and social media. To provide relevant data for cyber threat analysis, we have broadened the scope of our harvesting to include cyber related media, blogs, feeds, forums, as well as paste sites (web sites used to share code, logs, data dumps etc. anonymously) and parts of the ”dark web”– primarily Tor/Onion sites. Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4144-8/16/04. http://dx.doi.org/10.1145/2872518.2889294 . MalwareCategory "Microsoft Windows malware" Malware "Zeus" AttackVector "Trojan" Product "Microsoft Windows" CyberVulnerability "CVE-2011-3402" MalwareSignature "Win32/Zbot" MalwareSignature "Trojan/Zbot" Product "Windows TrueType" Vector Category Exploits Exploits Exploits Affects Affects Affects Signature Signature Hash "f5163b0edadcf40ada4ae 0594c399ef2b2058489" IOC Vector Company "Microsoft" Company "Kaspersky" Domain Domain Figure 1: An example of the Cyber Ontology 2. CYBER ONTOLOGY To adapt our system to cyber threat intelligence, we also added new entity and event types to represent this domain. In addition to standard named entities like persons, places, organizations etc. we added a set of new entity types of spe- cific interest to the cyber domain. These include: malware, malware categories, attack vectors, vulnerabilities, and tech- nical indicators such as hashes and file names. Figure 1 illustrates the entity ontology with some examples. Recorded Future detects a large set of events, ranging from person and corporate related ones to geopolitical and envi- ronmental events. Each type has a set of (sometimes op- tional) named attributes. To extend our world model to the cyber threat space, we added new event detectors for cyber attacks and vulnerabil- ity exploits. A cyber attack event relates an attacker to a target, and includes additional information about the attack method used and related hacktivist operation hashtags. All arguments are optional, but at least an attacker or a target must be identified. 3. DASHBOARDS AND TIMELINES The data collected by Recorded Future can be analyzed in our web UI, which provides powerful ways of filtering and visualizing events, individually or in aggregated form. Fig- ure 2 shows the cyber dashboard which gives an aggregated 60-day overview of cyber attack events, and Figure 3 is a line view illustrating the varying pattern of references to 867

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Temporal Analytics for Predictive Cyber Threat Intelligence · 4. PREDICTIVE ANALYTICS Our data can be used directly for some predictive analyt-ics, since certain future events such

Temporal Analytics for Predictive Cyber Threat Intelligence

[Tool Presentation Abstract]

Staffan TruvéRecorded Future

[email protected]

ABSTRACTRecorded Future has developed its Temporal Analytics En-gine as a general purpose platform for harvesting and ana-lyzing unstructured text from the open, deep, and dark web,and for transforming that content into a structured repre-sentation suitable for different analyses.

In this paper we present some of the key components ofour system, and show how it has been adapted to the in-creasingly important domain of cyber threat intelligence.

We also describe how our data can be used for predictiveanalytics, e.g. to predict the likelihood of a product vulner-ability being exploited or to assess the maliciousness of anIP address.

1. TEMPORAL ANALYTICSTemporal Analytics is by Recorded Future’s definition the

entire process of going from unstructured text describingevents in the world, via a structured representation of worldevents, to analyses that describe the current state of worldaffairs or predict future events. Our structured represen-tation is based on entities (persons, places, organizations,technologies, etc.) and events that relate entities to anevent type and a set of metadata such as the the source ofan event mention, its publishing time, and a derived eventtime. Analysis of this data can range from finding individ-ual event mentions to quantitative analysis of the number ofmentions, publishing and event time patterns, and machinelearning based predictive models. Our Temporal AnalyticsEngine today harvests and analyses millions of documents –ranging from short tweets to long articles and reports – everyday. The coverage is very broad, from government and bigmedia sites to blogs, forums, and social media. To providerelevant data for cyber threat analysis, we have broadenedthe scope of our harvesting to include cyber related media,blogs, feeds, forums, as well as paste sites (web sites used toshare code, logs, data dumps etc. anonymously) and partsof the ”dark web” – primarily Tor/Onion sites.

Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). IW3C2 reserves the right to provide a hyperlink to theauthor’s site if the Material is used in electronic media.WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada.ACM 978-1-4503-4144-8/16/04.http://dx.doi.org/10.1145/2872518.2889294 .

MalwareCategory"Microsoft Windows malware"

Malware"Zeus"

AttackVector"Trojan"

Product"Microsoft Windows"

CyberVulnerability"CVE-2011-3402"

MalwareSignature"Win32/Zbot"

MalwareSignature"Trojan/Zbot"

Product"Windows TrueType"

Vector

Category

ExploitsExploits

Exploits

Affects

AffectsAffects

Signature

Signature

Hash"f5163b0edadcf40ada4ae0594c399ef2b2058489"

IOCVector

Company"Microsoft"

Company"Kaspersky"

Domain

Domain

Figure 1: An example of the Cyber Ontology

2. CYBER ONTOLOGYTo adapt our system to cyber threat intelligence, we also

added new entity and event types to represent this domain.In addition to standard named entities like persons, places,organizations etc. we added a set of new entity types of spe-cific interest to the cyber domain. These include: malware,malware categories, attack vectors, vulnerabilities, and tech-nical indicators such as hashes and file names. Figure 1illustrates the entity ontology with some examples.

Recorded Future detects a large set of events, ranging fromperson and corporate related ones to geopolitical and envi-ronmental events. Each type has a set of (sometimes op-tional) named attributes.

To extend our world model to the cyber threat space, weadded new event detectors for cyber attacks and vulnerabil-ity exploits. A cyber attack event relates an attacker to atarget, and includes additional information about the attackmethod used and related hacktivist operation hashtags. Allarguments are optional, but at least an attacker or a targetmust be identified.

3. DASHBOARDS AND TIMELINESThe data collected by Recorded Future can be analyzed

in our web UI, which provides powerful ways of filtering andvisualizing events, individually or in aggregated form. Fig-ure 2 shows the cyber dashboard which gives an aggregated60-day overview of cyber attack events, and Figure 3 is aline view illustrating the varying pattern of references to

867

Page 2: Temporal Analytics for Predictive Cyber Threat Intelligence · 4. PREDICTIVE ANALYTICS Our data can be used directly for some predictive analyt-ics, since certain future events such

Figure 2: Cyber Dashboard showing trending cyberattack events.

Figure 3: 2015 timeline for BlackEnergy malware.

the BlackEnergy malware during all of 2015. The user caninteractively filter and zoom to drill down into details.

4. PREDICTIVE ANALYTICSOur data can be used directly for some predictive analyt-

ics, since certain future events such as hacktivist attacks arementioned ahead of time (see Figure 4 as an example), thusgiving the defenders advance warning of upcoming threats.For other use cases, predictive models need to be constructedbased on historic data. We provide two examples of suchpredictive models in the cyber threat domain.

Predicting ExploitsOne interesting use of our data is to build a model forpredicting which published product vulnerabilities are mostlikely to be exploited. In [1] we describe how a SupportVector Machine classifier can be trained to predict the like-lihood of a vulnerability being exploited, with an accuracyof 0.83 (precision = 0.82, recall = 0.84), when trained on adata set consisting of 7,528 samples from 2010-01-01 - 2014-12-31, with an equal amount of exploited and unexploitedvulnerabilities. One noteworthy result is also that the textused to describe vulnerabilities is more informative than theparameters assigned in the National Vulnerability Database(nvd.nist.gov).

Figure 4: Future cyber attacks known on March 31st2015 – a planned Anonymous attack on Israel onApril 7th is the major known future event.

Figure 5: Risk scoring and entity information for IPaddress

Maliciousness of IP addressesA second example of analytics based on our data is to assigna risk score to an IP address. Risk scores are valuable forthreat analysts who want to get a quick aggregate score,to decide whether to explore an IP in further detail, andcan also be used to automate blocking of IP addresses bynetwork security equipment. Figure 5 show an informationcard for an IP address, with its risk score and associatedinformation.

5. CONCLUSIONSWe have presented the basics of the Temporal Analytics

Engine, and described how it has been adapted to the cy-ber threat intelligence domain. We have also illustrated itsuse in predictive analytics by showing two examples relatedto the prediction of cyber exploits and the assessment ofmaliciousness of IP addresses.

6. REFERENCES[1] M. Edkrantz and A. Said. Predicting cyber

vulnerability exploits with machine learning. InScandinavian Conference on Artificial Intelligence,pages 48–57, 2015.

868