workshop: big data visualization for security

59
Raffael Marty, CEO Big Data Visualization for Security UE14 - Romania September 2014

Upload: raffael-marty

Post on 11-Nov-2014

2.783 views

Category:

Internet


8 download

DESCRIPTION

Big Data is the latest hype in the security industry. We will have a closer look at what big data is comprised of: Hadoop, Spark, ElasticSearch, Hive, MongoDB, etc. We will learn how to best manage security data in a small Hadoop cluster for different types of use-cases. Doing so, we will encounter a number of big-data open source tools, such as LogStash and Moloch that help with managing log files and packet captures. As a second topic we will look at visualization and how we can leverage visualization to learn more about our data. In the hands-on part, we will use some of the big data tools, as well as a number of visualization tools to actively investigate a sample data set.

TRANSCRIPT

Page 1: Workshop: Big Data Visualization for Security

Raffael Marty, CEO

Big Data Visualization for Security

UE14 - Romania September 2014

Page 2: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .3

I am Raffy - I do Viz!

IBM Research

Page 3: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .4

Introduction

Data Sources

DAVIX

Log Data Processing

Agenda

• Big Data Ecosystem • Security Big Data Tools • Managing Security Data • Visualizing Big Data

Page 4: Workshop: Big Data Visualization for Security

6http://www.bigdatalandscape.com/

Page 5: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .8

Velocity

Big Data - The Three V’s

Volume

Variety

Page 6: Workshop: Big Data Visualization for Security

The Big Data Ecosystem

9

Page 7: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .10

Hadoop Ecosystem

Mahout machine learning

Hive data warehouse

HiveQL query lang

Pig programming language

(pig latin)

HBase big data store

rndm read and write auto sharding

Map Reduce

Impala interactive

SQL queries

distributed file system data redundancy fault-tolerance

HDFSrandom, real-time read/write access

append only namenode / datanode architecture

Zook

eepe

r ce

ntra

lized

“bra

in”

Sentry

Stor

m

Page 8: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .11

Berkeley Data Analysis Stack (BDAS)

https://amplab.cs.berkeley.edu/software/

SparkSQL

Page 9: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .12

• Schema free & document oriented

• Simple HTTP interface

• indexes JSON documents

• Queries, aggregations, highlighting, etc.

• Distributed - super easy to add nodes

• Real-time indexing • Based on Lucene

• Replication

• Partitioning / sharding

• how an index is assigned to nodes

• Snapshots

Elastic Search

Up and running in 10 minutes!!

http://elasticsearch.org

Page 10: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .13

Elastic Search - Admin Interface

Page 11: Workshop: Big Data Visualization for Security

Big Data Security Tools

14

Page 12: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .15

• Elastic Search

• LogStash

• Kibana

ELK Stack

Page 13: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .16

LogStash http://logstash.net/

input filter output

http://www.elasticsearch.org/overview/logstash

Page 14: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .17

logstash http://logstash.net/

input files syslog email tcp socket Flume !

AMQP STOMP Beanstalk redis !

twitter HTTP

filter timestamp parsing anonymize drop events parse fields (grok) multiline joins

output ElasticSearch Graylog2/GELF MongoDB Nagios TCP syslog WebSockets !

AMQP STOMP beanstalk redis

messaging

formats avro msgpack thrift xml protobuf csv

Page 15: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .18

Storing and Indexing Logs

Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2

{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”}

Non parsed:

Parsed (through grok in LogStash):{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”, “time”: “Aug 2 13:29:58”, “host”: “pixl-ram”, ”process”: “sshd”, “pid”: 1631}

Raw log:

-> structured search: time > “Aug 1 2014”

Page 16: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .19

• Instead of re-writing regexes

• Ships with about 100 patterns

• Patterns you don't have to write yourself

• It is easy to add new patterns

Grok

HOSTNAME \b(?:[0-9A-Za-z].......!IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]…!IPORHOST (?:%{HOSTNAME}|%{IP})!

Page 17: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .20

• Automatic schema inference • Assigns analyzers (prefix indexing, etc.) • Field properties:

• “store” [field and document level] • “index”:

• “analyzed”: tokenized, analyzed • “not_analyzed”: indexed as is • “no”: no indexing

ElasticSearch on Grokked Data

Page 18: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .21

Grok Patterns

/opt/logstash/patterns

Pattern database located in:

!

Debug Grok rules:

http://grokdebug.herokuapp.com/

Page 19: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .22

LogStash UI - Kibana

Page 20: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .23

• Block POST / PUT / DELETE to ES instance

• Older versions:

script.disable_dynamic: true!

! action.destructive_requires_name: true!

• Use aliases to allow only certain users access to certain indexes

• Use iptables to block ports (9200, 9300, …)

• Performance tuning:

• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/

Running ElasticSearch

Page 21: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .24

For debugging:

logstash -e ‘input { … } … output { … }’ !

!

Other Command line parameters:

-w <number of cores>!

--debug!

!

!

Running LogStash

input { stdin { type => "stdin-type" } ! file { type => "syslog-ng" path => [ "/var/log/*.log", “/var/log/messages" ] } } !output { stdout { } elasticsearch{ embedded => false host => "192.168.0.23" cluster => "logstash-cluster" node_name => “logstash" protocol => “node” } }

Act as an ES node, not as an unknown client

Page 22: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .25

Authentication not built in

Use nginx as a proxy

For example:

Running Kibana

https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf

Page 23: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .26

Open source, large scale IPv4

packet capturing, indexing and

database system powered by elastic

search.

Web interface for PCAP browsing,

searching, reporting, and exporting

PCAPs

Moloch

https://github.com/aol/moloch

Page 24: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .27

• Capture

• Sniffs the network interface,

• Parses the traffic and creates the Session Profile Information (aka SPI-Data)

• Writes the packets to disk

!

• Database

• Elasticsearch is used for storing and searching through the SPI-Data

!

• Viewer

• A web interface that allows for GUI and API access from remote hosts

Moloch – Components

Page 25: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .28

• Moloch parses various protocols to create SPI-Data:

• IP

• HTTP

• DNS • IP Address • Hostname

• IRC • Channel Names

• SSH • Client Name • Public Key

• SSL/TLS • Certificate elements of various types (common names, serial, etc) !

• This is not an all inclusive list

Moloch – Capture – SPI-Data Types

Page 26: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .34

• Web API’s

• Access meta information

• Grab PCAPs

!

• Indexing PCAP files:

! ${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]

Moloch - Couple Additions

Page 27: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .35

• Analyze PCAP files using Apache Pig

• Number of scripts made available

• e.g., running SNORT on the PCAPs

!

PacketPig

https://github.com/bigsnarfdude/packetpig

pig -x local \! -f pig/examples/binning.pig \! -param pcap=data/web.pcap

Page 28: Workshop: Big Data Visualization for Security

copyright (c) 2014pixlcloud | turning data into actionable insights

Security Onion•Bro IDS, your choice of Snort or Suricata, Sguil analyst console, ELSA, Squert, Snorby and capME web interfaces

•All setup to work with each other out of the box

http://securityonion.blogspot.com/

Page 29: Workshop: Big Data Visualization for Security

Storing Security Data

37

Page 30: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .38

• What data do you have?

• PCAP

• Flows

• Context, (e.g., threat feeds)

• “Text” logs

• What’s your use-case?

• Search

• Analytics

• Forensics on PCAP

Data Type and Use

Index -> Elastic Search

Columnar, SQL enabled

Moloch? Or extract meta data and store PCAP in HDFS/HBase

PCAP in HDFS or HBase

Row or columnar, fixed schema?

Unstructured in ElasticSearch, enrich on ingestion?

ES or relational

Page 31: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .39

OpenSOC

Page 32: Workshop: Big Data Visualization for Security

Raffael . Marty @ pixlcloud . com

40

Visualization

Page 33: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .41

Visualization To …

Present / Communicate Discover / Explore

Page 34: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .42

Show Context

42

Page 35: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .43

Show Context

42 is just a number

and means nothing without context

Page 36: Workshop: Big Data Visualization for Security
Page 37: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .45

Use Numbers To Highlight Most Important Parts of Data

NumbersSummaries

Page 38: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .46

Visualization Creates Context

Visualization Puts Numbers (Data) in Context!

Page 39: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .47

• Show  comparisons, contrasts, differences

• Show  causality, mechanism, explanation, systematic structure.

• Show  multivariate data; that is, show more than 1 or 2 variables. !

by Edward Tufte

Principals of Analytic Design

Page 40: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .48

Additional information about objects, such as: • machine

• roles • criticality • location • owner • …

• user • roles • office location • …

Add Context

source destination

machine and user context

machine role

Page 41: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .49

Traffic Flow Analysis With Context

Page 42: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .50

Visualize Me Lots (>1TB) of Data

!! SecViz is Hard!

Page 43: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .51

Data Visualization Workflow

Overview Zoom / Filter Details on Demand

Principle by Ben Shneiderman

Page 44: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .52

This visualization process requires:

• Low latency, scalable backend (columnar, distributed data store)

• Efficient client-server communications and caching

• Assistance of data mining to

• Reduce overall data to look at

• Highlight relationships, patterns, and outliers

• Assist analyst in focussing on ‘important’ areas

Backend Support

Page 45: Workshop: Big Data Visualization for Security

Visualization Tools

53

Page 46: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .54

• Graphs:

• Histogram

• Box plots

• Scatterplot

• Mosaicplots

• Parallel Coordinates

• Boxplots

• ...

• Linking, brushing, …

• Reads CSV files

Mondrian

http://www.theusrus.de/Mondrian/

Page 47: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .55

Treemap 4.1

www.cs.umd.edu/hcil/treemap

TM3 Input files:Source Port Destination Action

STRING INTEGER STRING STRING

10.0.0.2 80 23.2.1.2 failed

Page 48: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .56

Gephi http://gephi.org

•Gephi UI • interactive link graphs • multiple layout algorithms • reads: CSV, DOT, GDF, etc. • graph metrics

•Gephi Toolkit • APIs

•Gephi Plugins •Gephi ‘Platform’ • adding JavaFX components

Page 49: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .57

1. Loading Data

Visually Finding Insight in Gephi

Page 50: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .58

2. Run Layout Algorithm (Force Atlas 2)

Visually Finding Insight in Gephi

Page 51: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .60

Visually Finding Insight in Gephi

3. Use Degree as color and size of nodes

Page 52: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .63

Visually Finding Insight in Gephi

6. Use Preview and export Graph

Page 53: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .65

AfterGlow - Creating DOT/GDF Files From CSV

CSV File Graph LanguageFile

digraph structs { graph [label="AfterGlow 1.5.8", fontsize=8]; node [shape=ellipse, style=filled, fontsize=10, width=1, height=1, fixedsize=true]; edge [len=1.6]; ! "aaelenes" -> "Printing Resume" ; "abbe" -> "Information Encryption" ; "aanna" -> "Patent Access" ; "aatharuv" -> "Ping" ; }

aaelenes,Printing Resume abbe,Information Encrytion aanna,Patent Access aatharuy,Ping

Parser Grapher

cat file | ./afterglow –c simple.properties –t | neato –Tgif –o test.gif

Page 54: Workshop: Big Data Visualization for Security

Hands On

66

Page 55: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .67

1. Get data into ElasticSearch

Parse data first, then store in ES

2. Get data out of ES (query)

Get into data format for visualization tool (e.g., CSV)

3. Visualize in the visualization tool

Potentially translate CSV into other format (e.g., DOT, GDF)

Process the data (aggregation, enhancement, etc)

Processing Pipeline

Page 56: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .68

1. Check out /home/davix/ue14

logstash-syslog.conf [read, understand!]

2. Run logstash and index data: ! sudo /opt/logstash/bin/logstash -f logstash-syslog.conf!! head -10 firewall | nc localhost 5000! ! # send data

3. Check what’s in LogStash:

sudo /etc/init.d/logstash-web start!

! open http://localhost:9292 !# kibana

4. Use script to extract data

read_es.py [check out the script]

update the script to output a (src_ip, dst_ip, dst_port) tuple

5. Convert the CSV output to a GDF file to then load into Gephi

OR create a TM3 file for the treemap tool

LogStash Setup - Exercise

curl 'http://localhost:9200/_all/_search?q=ACCEPTED' curl ‘http://localhost:9200/twitter/_search?q=user:kimchy'

Page 57: Workshop: Big Data Visualization for Security

Secur i ty. Analyt ics . Ins ight .69

BlackHat Europe - Workshop

VISUAL ANALYTICS DELIVERING ACTIONABLE SECURITY INTELLIGENCE

October 14, 15 - Amsterdam

Page 58: Workshop: Big Data Visualization for Security

copyright (c) 2013pixlcloud | turning data into actionable insights

Share, discuss, challenge, and learn about security visualization.

•http://secviz.org •List: secviz.org/mailinglist

•Twitter: @secviz

Security Visualization Community