workshop: big data visualization for security

59
Raffael Marty, CEO Big Data Visualization for Security UE14 - Romania September 2014

Upload: raffael-marty

Post on 21-Apr-2017

18.403 views

Category:

Internet


6 download

TRANSCRIPT

Raffael Marty, CEO

Big Data Visualization for Security

UE14 - Romania September 2014

Secur i ty. Analyt ics . Ins ight .3

I am Raffy - I do Viz!

IBM Research

Secur i ty. Analyt ics . Ins ight .4

Introduction

Data Sources

DAVIX

Log Data Processing

Agenda

• Big Data Ecosystem • Security Big Data Tools • Managing Security Data • Visualizing Big Data

6http://www.bigdatalandscape.com/

Secur i ty. Analyt ics . Ins ight .8

Velocity

Big Data - The Three V’s

Volume

Variety

The Big Data Ecosystem

9

Secur i ty. Analyt ics . Ins ight .10

Hadoop Ecosystem

Mahout machine learning

Hive data warehouse

HiveQL query lang

Pig programming language

(pig latin)

HBase big data store

rndm read and write auto sharding

Map Reduce

Impala interactive

SQL queries

distributed file system data redundancy fault-tolerance

HDFSrandom, real-time read/write access

append only namenode / datanode architecture

Zook

eepe

r ce

ntra

lized

“bra

in”

Sentry

Stor

m

Secur i ty. Analyt ics . Ins ight .11

Berkeley Data Analysis Stack (BDAS)

https://amplab.cs.berkeley.edu/software/

SparkSQL

Secur i ty. Analyt ics . Ins ight .12

• Schema free & document oriented

• Simple HTTP interface

• indexes JSON documents

• Queries, aggregations, highlighting, etc.

• Distributed - super easy to add nodes

• Real-time indexing • Based on Lucene

• Replication

• Partitioning / sharding

• how an index is assigned to nodes

• Snapshots

Elastic Search

Up and running in 10 minutes!!

http://elasticsearch.org

Secur i ty. Analyt ics . Ins ight .13

Elastic Search - Admin Interface

Big Data Security Tools

14

Secur i ty. Analyt ics . Ins ight .15

• Elastic Search

• LogStash

• Kibana

ELK Stack

Secur i ty. Analyt ics . Ins ight .16

LogStash http://logstash.net/

input filter output

http://www.elasticsearch.org/overview/logstash

Secur i ty. Analyt ics . Ins ight .17

logstash http://logstash.net/

input files syslog email tcp socket Flume !

AMQP STOMP Beanstalk redis !

twitter HTTP

filter timestamp parsing anonymize drop events parse fields (grok) multiline joins

output ElasticSearch Graylog2/GELF MongoDB Nagios TCP syslog WebSockets !

AMQP STOMP beanstalk redis

messaging

formats avro msgpack thrift xml protobuf csv

Secur i ty. Analyt ics . Ins ight .18

Storing and Indexing Logs

Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2

{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”}

Non parsed:

Parsed (through grok in LogStash):{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”, “time”: “Aug 2 13:29:58”, “host”: “pixl-ram”, ”process”: “sshd”, “pid”: 1631}

Raw log:

-> structured search: time > “Aug 1 2014”

Secur i ty. Analyt ics . Ins ight .19

• Instead of re-writing regexes

• Ships with about 100 patterns

• Patterns you don't have to write yourself

• It is easy to add new patterns

Grok

HOSTNAME \b(?:[0-9A-Za-z].......!IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]…!IPORHOST (?:%{HOSTNAME}|%{IP})!

Secur i ty. Analyt ics . Ins ight .20

• Automatic schema inference • Assigns analyzers (prefix indexing, etc.) • Field properties:

• “store” [field and document level] • “index”:

• “analyzed”: tokenized, analyzed • “not_analyzed”: indexed as is • “no”: no indexing

ElasticSearch on Grokked Data

Secur i ty. Analyt ics . Ins ight .21

Grok Patterns

/opt/logstash/patterns

Pattern database located in:

!

Debug Grok rules:

http://grokdebug.herokuapp.com/

Secur i ty. Analyt ics . Ins ight .22

LogStash UI - Kibana

Secur i ty. Analyt ics . Ins ight .23

• Block POST / PUT / DELETE to ES instance

• Older versions:

script.disable_dynamic: true!

! action.destructive_requires_name: true!

• Use aliases to allow only certain users access to certain indexes

• Use iptables to block ports (9200, 9300, …)

• Performance tuning:

• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/

Running ElasticSearch

Secur i ty. Analyt ics . Ins ight .24

For debugging:

logstash -e ‘input { … } … output { … }’ !

!

Other Command line parameters:

-w <number of cores>!

--debug!

!

!

Running LogStash

input { stdin { type => "stdin-type" } ! file { type => "syslog-ng" path => [ "/var/log/*.log", “/var/log/messages" ] } } !output { stdout { } elasticsearch{ embedded => false host => "192.168.0.23" cluster => "logstash-cluster" node_name => “logstash" protocol => “node” } }

Act as an ES node, not as an unknown client

Secur i ty. Analyt ics . Ins ight .25

Authentication not built in

Use nginx as a proxy

For example:

Running Kibana

https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf

Secur i ty. Analyt ics . Ins ight .26

Open source, large scale IPv4

packet capturing, indexing and

database system powered by elastic

search.

Web interface for PCAP browsing,

searching, reporting, and exporting

PCAPs

Moloch

https://github.com/aol/moloch

Secur i ty. Analyt ics . Ins ight .27

• Capture

• Sniffs the network interface,

• Parses the traffic and creates the Session Profile Information (aka SPI-Data)

• Writes the packets to disk

!

• Database

• Elasticsearch is used for storing and searching through the SPI-Data

!

• Viewer

• A web interface that allows for GUI and API access from remote hosts

Moloch – Components

Secur i ty. Analyt ics . Ins ight .28

• Moloch parses various protocols to create SPI-Data:

• IP

• HTTP

• DNS • IP Address • Hostname

• IRC • Channel Names

• SSH • Client Name • Public Key

• SSL/TLS • Certificate elements of various types (common names, serial, etc) !

• This is not an all inclusive list

Moloch – Capture – SPI-Data Types

Secur i ty. Analyt ics . Ins ight .34

• Web API’s

• Access meta information

• Grab PCAPs

!

• Indexing PCAP files:

! ${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]

Moloch - Couple Additions

Secur i ty. Analyt ics . Ins ight .35

• Analyze PCAP files using Apache Pig

• Number of scripts made available

• e.g., running SNORT on the PCAPs

!

PacketPig

https://github.com/bigsnarfdude/packetpig

pig -x local \! -f pig/examples/binning.pig \! -param pcap=data/web.pcap

copyright (c) 2014pixlcloud | turning data into actionable insights

Security Onion•Bro IDS, your choice of Snort or Suricata, Sguil analyst console, ELSA, Squert, Snorby and capME web interfaces

•All setup to work with each other out of the box

http://securityonion.blogspot.com/

Storing Security Data

37

Secur i ty. Analyt ics . Ins ight .38

• What data do you have?

• PCAP

• Flows

• Context, (e.g., threat feeds)

• “Text” logs

• What’s your use-case?

• Search

• Analytics

• Forensics on PCAP

Data Type and Use

Index -> Elastic Search

Columnar, SQL enabled

Moloch? Or extract meta data and store PCAP in HDFS/HBase

PCAP in HDFS or HBase

Row or columnar, fixed schema?

Unstructured in ElasticSearch, enrich on ingestion?

ES or relational

Secur i ty. Analyt ics . Ins ight .39

OpenSOC

Raffael . Marty @ pixlcloud . com

40

Visualization

Secur i ty. Analyt ics . Ins ight .41

Visualization To …

Present / Communicate Discover / Explore

Secur i ty. Analyt ics . Ins ight .42

Show Context

42

Secur i ty. Analyt ics . Ins ight .43

Show Context

42 is just a number

and means nothing without context

Secur i ty. Analyt ics . Ins ight .45

Use Numbers To Highlight Most Important Parts of Data

NumbersSummaries

Secur i ty. Analyt ics . Ins ight .46

Visualization Creates Context

Visualization Puts Numbers (Data) in Context!

Secur i ty. Analyt ics . Ins ight .47

• Show  comparisons, contrasts, differences

• Show  causality, mechanism, explanation, systematic structure.

• Show  multivariate data; that is, show more than 1 or 2 variables. !

by Edward Tufte

Principals of Analytic Design

Secur i ty. Analyt ics . Ins ight .48

Additional information about objects, such as: • machine

• roles • criticality • location • owner • …

• user • roles • office location • …

Add Context

source destination

machine and user context

machine role

Secur i ty. Analyt ics . Ins ight .49

Traffic Flow Analysis With Context

Secur i ty. Analyt ics . Ins ight .50

Visualize Me Lots (>1TB) of Data

!! SecViz is Hard!

Secur i ty. Analyt ics . Ins ight .51

Data Visualization Workflow

Overview Zoom / Filter Details on Demand

Principle by Ben Shneiderman

Secur i ty. Analyt ics . Ins ight .52

This visualization process requires:

• Low latency, scalable backend (columnar, distributed data store)

• Efficient client-server communications and caching

• Assistance of data mining to

• Reduce overall data to look at

• Highlight relationships, patterns, and outliers

• Assist analyst in focussing on ‘important’ areas

Backend Support

Visualization Tools

53

Secur i ty. Analyt ics . Ins ight .54

• Graphs:

• Histogram

• Box plots

• Scatterplot

• Mosaicplots

• Parallel Coordinates

• Boxplots

• ...

• Linking, brushing, …

• Reads CSV files

Mondrian

http://www.theusrus.de/Mondrian/

Secur i ty. Analyt ics . Ins ight .55

Treemap 4.1

www.cs.umd.edu/hcil/treemap

TM3 Input files:Source Port Destination Action

STRING INTEGER STRING STRING

10.0.0.2 80 23.2.1.2 failed

Secur i ty. Analyt ics . Ins ight .56

Gephi http://gephi.org

•Gephi UI • interactive link graphs • multiple layout algorithms • reads: CSV, DOT, GDF, etc. • graph metrics

•Gephi Toolkit • APIs

•Gephi Plugins •Gephi ‘Platform’ • adding JavaFX components

Secur i ty. Analyt ics . Ins ight .57

1. Loading Data

Visually Finding Insight in Gephi

Secur i ty. Analyt ics . Ins ight .58

2. Run Layout Algorithm (Force Atlas 2)

Visually Finding Insight in Gephi

Secur i ty. Analyt ics . Ins ight .60

Visually Finding Insight in Gephi

3. Use Degree as color and size of nodes

Secur i ty. Analyt ics . Ins ight .63

Visually Finding Insight in Gephi

6. Use Preview and export Graph

Secur i ty. Analyt ics . Ins ight .65

AfterGlow - Creating DOT/GDF Files From CSV

CSV File Graph LanguageFile

digraph structs { graph [label="AfterGlow 1.5.8", fontsize=8]; node [shape=ellipse, style=filled, fontsize=10, width=1, height=1, fixedsize=true]; edge [len=1.6]; ! "aaelenes" -> "Printing Resume" ; "abbe" -> "Information Encryption" ; "aanna" -> "Patent Access" ; "aatharuv" -> "Ping" ; }

aaelenes,Printing Resume abbe,Information Encrytion aanna,Patent Access aatharuy,Ping

Parser Grapher

cat file | ./afterglow –c simple.properties –t | neato –Tgif –o test.gif

Hands On

66

Secur i ty. Analyt ics . Ins ight .67

1. Get data into ElasticSearch

Parse data first, then store in ES

2. Get data out of ES (query)

Get into data format for visualization tool (e.g., CSV)

3. Visualize in the visualization tool

Potentially translate CSV into other format (e.g., DOT, GDF)

Process the data (aggregation, enhancement, etc)

Processing Pipeline

Secur i ty. Analyt ics . Ins ight .68

1. Check out /home/davix/ue14

logstash-syslog.conf [read, understand!]

2. Run logstash and index data: ! sudo /opt/logstash/bin/logstash -f logstash-syslog.conf!! head -10 firewall | nc localhost 5000! ! # send data

3. Check what’s in LogStash:

sudo /etc/init.d/logstash-web start!

! open http://localhost:9292 !# kibana

4. Use script to extract data

read_es.py [check out the script]

update the script to output a (src_ip, dst_ip, dst_port) tuple

5. Convert the CSV output to a GDF file to then load into Gephi

OR create a TM3 file for the treemap tool

LogStash Setup - Exercise

curl 'http://localhost:9200/_all/_search?q=ACCEPTED' curl ‘http://localhost:9200/twitter/_search?q=user:kimchy'

Secur i ty. Analyt ics . Ins ight .69

BlackHat Europe - Workshop

VISUAL ANALYTICS DELIVERING ACTIONABLE SECURITY INTELLIGENCE

October 14, 15 - Amsterdam

copyright (c) 2013pixlcloud | turning data into actionable insights

Share, discuss, challenge, and learn about security visualization.

•http://secviz.org •List: secviz.org/mailinglist

•Twitter: @secviz

Security Visualization Community