visualization in the age of big data
TRANSCRIPT
Raffael Marty, CEO
Visualization In The Age of Big Data
HoneyNet Project Workshop Stavanger, Norway
May, 2015
Secur i ty. Analyt ics . Ins ight .2
How Compromises Are Detected
Mandiant M Trends Report 2014 Threat Report
Attackers in networks before detection
27 days
229 days
Average time to resolve a cyber attack
Seems Like Cyber Security Is Not Working
Secur i ty. Analyt ics . Ins ight .3
breaches can be detected (early) - or even be prevented - if we looked at the data
Monitoring To The Rescue
Secur i ty. Analyt ics . Ins ight .4
Interactive Visualization
Secur i ty. Analyt ics . Ins ight .5
I am Raffy - I do Viz!
IBM Research
Secur i ty. Analyt ics . Ins ight .6
• Security Landscape
• What is Going Wrong?
• A New Approach
• Security Analytics
• Big Data Lake
• Visualization
• Challenges
• Data Discovery and Exploration
• Examples
Overview
Secur i ty. Analyt ics . Ins ight .7
Monitoring Tools
Scoring
Behavior
Log Mgmt
Threat Feeds
Context
Ticket
IR
False Positive
ManualTriage
Sandboxes
…
Data Sources
Firewall
IPS
Proxy
AV
Endpoint
…
SIEM
Secur i ty. Analyt ics . Ins ight .8
• Products / Tools • Firewall - Blocks traffic based on pre-defined rules • Web Application Firewall - Monitors for signs of known malicious activity in Web traffic • Intrusion Prevention System - Looks for ‘signs’ of known attacks in traffic and protocol violations • Anti Virus - Looks for ‘signs’ of known attacks on the end system • Malware Sandbox - Runs new binaries and monitors their behavior for malicious signs • Security Information Management - Uses pre-defined rules to correlate signs from different data
streams to augment intelligence • Vulnerability Scanning - Searches for known vulnerabilities and vulnerable software
• Rely on pattern matching and signatures based knowledge from the past • Reactive -> always behind • Unknown and new threats -> won’t be detected • ‘Imperfect’ patterns and rules -> cause a lot of false positives
We Are Monitoring - What is Going Wrong?
Defense Has Been Relying On Past Knowledge
Security Analytics
Secur i ty. Analyt ics . Ins ight .10
A New Approach
ENABLE analysts to leverage their knowledge effectively and efficiently
• scalability - big data based, extensible platform
• visualization - interactive exploration of billions of events
• knowledge - capture from experts
- leverage machines to guide
- automate where possible
- enable collaboration
We Need Analysts in the Loop!
(not better algorithms)
Secur i ty. Analyt ics . Ins ight .11
• Intercept attacks (APT) early in the kill chain
• Detecting intrusions
• Detecting data leaks
• Network-based anomaly detection
• Threat Intelligence
• Attack surface analysis
• Speed up forensic investigations and incident response
• Insider threat detection
• User behavior monitoring
• Privilege abuse
• Fraud detection
• Compliance
• Continuous monitoring
• Risk quantification and metrics
• Business improvements
• Spending justification for security
• Spending optimization (esp. cloud)
Use-Cases Enabled Through Analytics
Data Stores Analytics Forensics Models Admin
10.9.79.109 --> 3.16.204.150 10.8.24.80 --> 192.168.148.19310.8.50.85 --> 192.168.148.19310.8.48.128 --> 192.168.148.19310.9.79.6 --> 192.168.148.193
10.9.79.6
10.8.48.128
80
538.8.8.8
127.0.0.1
Anomalies
Decomposition
Data
Seasonal
Trend
Anomaly Details
Find Intruders and ‘New Attacks’
Resolve Incidents Quicker
Communicate Findings
Secur i ty. Analyt ics . Ins ight .12
Analytics Platform - How It’s Done
Rules Patterns Scoring
context
data
Security Big Data Lake
• Explore & Hunt
• Visual Forensics
Behavior Anomaly Detection
• Alert Triage
Visualization
Analytics
• Visualization in the center • Not relying on past knowledge • Analytics to support not alert
13
Visualization
Secur i ty. Analyt ics . Ins ight .14
Visualization To …
Present / Communicate Discover / Explore
Secur i ty. Analyt ics . Ins ight .15
Unknown Unknowns - Visualization Is Central
"There are 1000 ways for someone to steal information. If we knew how, we could have prevented it. Visualization helps find that one way.”
- CISO UBS Switzerland
Secur i ty. Analyt ics . Ins ight .16
Visualization Example (Unknown Unknowns)Pix lCloud i s a v isual analytics platform for cyber security.
This example shows a heatmap of behavior over time.
In this case, we see activity per user. We can see that ‘vincent’ is visually different from all of the other users. He shows up very lightly o v e r t h e e n t i re t i m e period. This seems to be something to look into.
We were able to find this purely v isual , without understanding the data more intrinsically.
Secur i ty. Analyt ics . Ins ight .17
Why Visualization?the stats ...
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
the data...
Secur i ty. Analyt ics . Ins ight .18
Why Visualization?
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Human analyst: • pattern detection • remembers context • fantastic intuition • can predict
Secur i ty. Analyt ics . Ins ight .
• Access to data
• Parsed data and data context
• Data architecture for central data access and fast queries
• Application of data mining (how?, what?, scalable, …)
• Visualization tools that support
• Complex visual types (||-coordinates, treemaps,
heat maps, link graphs)
• Linked views
• Data mining (clustering, …)
• Visual analytics workflow
19
Visualization Challenges
Secur i ty. Analyt ics . Ins ight .20
Access paradigms for a backend:
• Analytical queries - mainly for visual interaction
• Accessing large amounts of data in aggregated ways
• Support for intelligent caching (reduce slow re-query of data)
• Statistics - answering frequent ‘aggregation’ queries very fast
• Ad-hoc search
• Raw data retrieval
• Context - deal with data context for time-series data
Enablement - Data Layer Requirements
Note: No mention of HADOOP!
Big Data Lake
Secur i ty. Analyt ics . Ins ight .22
The Big Data Lake
• One central location to store all cyber security data • “Data collected only once and third party software leveraging it” • Scalability and interoperability
• Hard problems: • Parsing: can you re-parse? • Data store capabilities (search, analytics, distributed processing, etc.) • Access to data: SQL (even in Hadoop context), how can products
access the data?
Prevent Re-Collection?
Secur i ty. Analyt ics . Ins ight .23
The Security Data Lake - Federated Data Access
SIEM
dispatcher
SIEM connector SIEM console
Prod A
AD / LDAPHR
…
IDS
FW
Prod B
DBs
Data Lake
SNMP
Many many challenges!
Secur i ty. Analyt ics . Ins ight .24
Data Lake Version 0.5a
SIEM
columnar or
search engineor
log management
processing
SIEM connector
raw logs
SIEM console
SQL or searchinterface
processing filtering
HDFS
lake
Current solutions (log mgmt / siem): - not open - don’t scale
25
Data Discovery & Exploration
Secur i ty. Analyt ics . Ins ight .26
Visualize Me Lots (>1TB) of Data
Secur i ty. Analyt ics . Ins ight .27
Information Visualization Mantra
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
28
SecViz Examples
Secur i ty. Analyt ics . Ins ight .29
Additional information about objects, such as:
• machine • roles • criticality • location • owner • …
• user • roles • office location • …
Add Context
source destination
machine and user context
machine role
user role
Secur i ty. Analyt ics . Ins ight .30
Traffic Flow Analysis With Context
Secur i ty. Analyt ics . Ins ight .31
An Analytical Example - Monitor Password Resets
threshold
outliers have different magnitudes
Secur i ty. Analyt ics . Ins ight .32
Approximate Curvefitting a curve distance to curve
Secur i ty. Analyt ics . Ins ight .33
• Holt Winters is exponential smoothing • Lets you define thresholds for alerting!
Data Mining Applied
• Hard to define alert threshold
better threshold
copyright (c) 2013pixlcloud | creating actionable data stories
Internet Service Provider
• Monitoring entire network • shows scans across
customers on port 445 (Windows shares)
new worm emerging
Secur i ty. Analyt ics . Ins ight .35
Machine Learning - Clustering Users
Source:Email logs
Explanation:The graph shows email communications between employees and outside people.
By clustering the data, different user groups become visible automatically. It became visible that there was an entire cluster that we cannot assign to a known group of users!
unknown
product teams
sales and marketing
competition
Secur i ty. Analyt ics . Ins ight .36
Intra-Role Anomaly - Random Order
users
time
dc(machines)
Secur i ty. Analyt ics . Ins ight .37
Intra-Role Anomaly - With Seriation
Secur i ty. Analyt ics . Ins ight .38
Intra-Role Anomaly - Sorted by User Role
Administrator
Sales
Development
Finance
Admin???
Secur i ty. Analyt ics . Ins ight .39
• This looks interesting
• What is it?
• Green -> Port 53
• Only port 53?
• What IPs?
• What’s the time behavior?
• The graph doesn’t answer
these questions
Graphs - A Story
Secur i ty. Analyt ics . Ins ight .40
Graphs - A Story
• Adding a port histogram
• Select DNS traffic
and see if other
ports light up.
Note how this is a
user experience
challenge!
Secur i ty. Analyt ics . Ins ight .41
• Linked Views
• Histograms for
• Source
• Port (Source)
• Destination
• ||-coord
DNS Traffic - A Closer Look
42
Bringing It All Together
Secur i ty. Analyt ics . Ins ight .43
Bringing It All Together
Data Stores Analytics Forensics Models Admin
10.9.79.109 --> 3.16.204.150 10.8.24.80 --> 192.168.148.19310.8.50.85 --> 192.168.148.19310.8.48.128 --> 192.168.148.19310.9.79.6 --> 192.168.148.193
10.9.79.6
10.8.48.128
80
538.8.8.8
127.0.0.1
Anomalies
Decomposition
Data
Seasonal
Trend
Anomaly Details
“Hunt” ExplainVisual Search
• Big data backend • Own visualization engine (Web-based) • Visualization workflows
Secur i ty. Analyt ics . Ins ight .44
http://secviz.org
List: secviz.org/mailinglist
Twitter: @secviz
Share, discuss, challenge, and learn about security visualization.
Security Visualization Community
Secur i ty. Analyt ics . Ins ight .45
BlackHat Workshop
Visual Analytics - Delivering Actionable Security
Intelligence
August 1-6 2015, Las Vegas, USA
big data | analytics | visualization
http://secviz.org
Secur i ty. Analyt ics . Ins ight .
http://slideshare.net/zrlram
http://secviz.org and @secviz
Further resources: