real-time, high throughput traffic analytics with ...€¦ · real-time, high throughput traffic...

36
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley [email protected] February 04, 2016

Upload: others

Post on 22-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley [email protected]

February 04, 2016

Page 2: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Disclaimers

Portions of the data we use to make analytical decisions have been removed.

The removal of this data does not impact the functionality of the system. •  This redaction is done as a security measure as our network is under constant threat. •  The overall system functionality is present here intact and only a small set of data

properties used to make analytical decisions have been removed. •  File paths, network ports, data sampling rates, and specific packet properties are

types of data that has been generalized or redacted.

Software Versions

We are running recent versions of software but not bleeding edge.

Software discussed herein references the following versions.

•  Elasticsearch – 1.3.1

•  Logstash – 1.4.6

•  Rsyslog – 8.12.0

•  Ubuntu – 14.04 LTS

•  Python – 2.7.5

2

Page 3: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Disclaimers

One last thing…

We are still using facets instead of aggregations in Elasticsearch. This is a result of the system being built before aggregations existed.

We will be switching but facets do still function today. (They are deprecated though.)

The end result is facets and aggregations have very similar functions, aggregations are just more efficient.

3

Page 4: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Terminology

Terminology used throughout this presentation

•  Anycast – Global load balancing based on shared IP address assignment. •  CDN – Content delivery network. •  GSLB – Global server load balancing. •  OSI Layer 3 – The network layer of the OSI model. In this case IPv4 or

IPv6. •  OSI Layer 4 – The transport layer of the OSI model. In this case TCP, UDP,

or ICMP. •  Signature – A tuple of packet properties that are common during an attack. •  Spoof – The action of masking or faking IP addresses as part of an attack. •  VDMS – Verizon Digital Media Services.

4

Page 5: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Overview

The VDMS network is under constant attack.

Over the past years we have developed an application (Stonefish) to identify threats in real-time and mitigate them in fast time scales.

•  Time to identification and time to resolution are independent metrics that are critical to the performance of our CDN.

The purpose of the Stonefish application is to identify threats at OSI layer 3 and 4 and take action to filter this malicious traffic from our network.

5

Page 6: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

CDNs and the DDOS problem

For CDNs performance and uptime are the most critical metrics.

The VDMS mission is to be the fastest, most performant CDN on the planet. •  We are incredibly sensitive to anything that impacts performance. •  DDOS attacks are a serious threat to customer retention.

Predicting and Preventing every single attack is impossible.

•  The design goal of the system was originally to obtain the largest network coverage and protection in a realistic manner.

•  The nature of DDOS attacks makes them relatively easy to initiate but difficult to track to origination point(s).

6

Page 7: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Stonefish Origin Story

The origination of Stonefish starts in 2011.

During this time EdgeCast was suffering from large scale multi-day DDOS attacks. •  Tracking attack signatures was a manual process. •  Manual rules generated to filter signatures. •  Globally distribute these rules to servers and routers. This was not an effective means for dealing with attacks. •  Time to identification and mitigation was not acceptable. •  The level of effort and engineering resources required to do this were

significant. •  CDN performance was impacted for unacceptable time periods while this

manual process occurred.

7

Page 8: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Stonefish Design Goals

The concept that was created was a system to automatically identify and stop attacks.

•  Identify an attack on our network anywhere in the world in 60 seconds.

•  Mitigate 99% of attacks on our network within 60 seconds of identification.

•  Monitor connection state information for millions of connections in near real-time.

•  Block the most common attacks without human intervention.

8

Page 9: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 9

VDMS Simplified POP View

Routing Firewall Load Balance Edge

Internet peering

and uplinks,

ACLs only

Stateful inspection

Intelligent control plane

App layer

Page 10: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Stonefish Ecosystem

10

Sample Data (IPTables)

Transmit Data (Rsyslog)

Ingest Data (Logstash)

Store Data (Elasticsearch)

Process Data (Stonefish)

Push Rules (Stonefish)

Page 11: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Cluster Hardware

The ‘heart’ of Stonefish is an Elasticsearch cluster

This is a fully collapsible cluster that utilizes the VDMS Anycast architecture to provide 99.999% uptime.

Cluster Composition

•  240 Intel Xeon cores

•  640GB RAM

•  120TB SSD storage

•  Redundant 10GB Ethernet links to each server in the cluster

11

Page 12: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Collection and Storage

Data collection occurs at routing and load balancing layers •  Load balancing layer runs Linux

–  Kernel packet logs sampled data via IPTables. •  Rsyslog sends JSON formatted packet data to Anycast address. •  Logstash receives JSON packet data and ingests to Elasticsearch.

All cluster servers are horizontally scalable •  All servers in the cluster run an identical stack. •  One Anycast address is used to access all servers in the cluster. •  Any server in the cluster can ingest packet data.

12

Page 13: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 13

Data flow from edge to cluster

Load balancers

Kernel packet logging

Rsyslog JSON formatting

Rsyslog TCP TLS transmission

Stonefish Cluster – Logstash Ingestion

Stonefish Cluster – Elasticsearch storage

SYN UDP DNS ICMP

Stonefish Threat Detection

Stonefish Dashboard

Inbound traffic

Page 14: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Kernel provides packet logging via IPTables user space tools

Example Log Rules iptables –t filter -A log_chain -p tcp -m tcp --tcp-flags SYN SYN -m statistic --mode random --probability 0.0001 -j LOG --log-prefix \"ipt-type1: \" --log-level 7

iptables –t filter -A log_chain -p icmp -m statistic --mode random --probability 0.0001 -j LOG --log-prefix \"ipt-type2: \" --log-level 7

iptables –t filter -A log_chain -p udp --dport 53 -m statistic --mode random --probability 0.0001 -j LOG --log-prefix \"ipt-type3: \" --log-level 7

iptables –t filter -A log_chain -p udp -m statistic --mode random --probability 0.0001 -j LOG --log-prefix \"ipt-type4: \" --log-level 7

Output Chain log_chain (1 references)

pkts bytes target prot opt in out source destination

0 0 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp flags:0x02/0x02 statistic mode random probability 0.0001 LOG flags 0 level 7 prefix "ipt-type1 "

0 0 LOG icmp -- * * 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.0001 LOG flags 0 level 7 prefix "ipt-type2: "

0 0 LOG udp -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 statistic mode random probability 0.0001 LOG flags 0 level 7 prefix "ipt-type3: "

0 0 LOG udp -- * * 0.0.0.0/0 0.0.0.0/0 statistic mode random probability 0.0001 LOG flags 0 level 7 prefix "ipt-type4: “

Result [[11507705.594434] ipt-type2: IN=bond0 OUT= MAC=ec:f4:bb:e7:13:a8:78:19:f7:38:0f:f0:08:00 SRC=10.0.0.1 DST=10.1.0.1 LEN=68 TOS=0x00 PREC=0x00 TTL=61 ID=32341 PROTO=ICMP TYPE=3 CODE=1 [SRC=93.184.221.200 DST=87.209.64.100 LEN=40 TOS=0x00 PREC=0x00 TTL=58 ID=5286 DF PROTO=TCP SPT=443 DPT=56484 WINDOW=290 RES=0x00 ACK FIN URGP=0 ] MARK=0x8

[11507804.647848] ipt-type1: IN=bond0 OUT= MAC=ec:f4:bb:e7:13:a8:78:19:f7:38:0f:f0:08:00 SRC=10.0.0.2 DST=10.1.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=58 ID=28666 DF PROTO=TCP SPT=59999 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x7

[11507804.942317] ipt-type1 IN=bond0 OUT= MAC=ec:f4:bb:e7:13:a8:78:19:f7:38:0f:f0:08:00 SRC=10.0.0.3 DST=10.1.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=56 ID=53421 DF PROTO=TCP SPT=59761 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x7

Linux Kernel Packet Logging

14

Page 15: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Rsyslog templating is not user friendly, but powerful and fast Example /etc/rsyslog.d/10-ipt-log-transmit.conf

$template msg_parse, "{\"@timestamp\": \"%TIMESTAMP:::date-rfc3339%\", \"@hostname\": \"%hostname%\", \"@fields\": {\"field1\": \"%msg:R,ERE,1,BLANK,0:FLD1=([0-9?.]+)--end%\", \"field2\": \"%msg:R,ERE,1,BLANK,0:FLD2=([0-9?.]+)--end%\", \"field3\": \"%msg:R,ERE,1,BLANK, 0:FLD3=([0-9]+)--end%\", \"field4\": \"%msg:R,ERE,1,BLANK,0:FLD4=([0-9]+)--end%\", \"field5\": \"%msg:R,ERE,1,BLANK,0:FLD5=([0-9]+)--end%\", \"field6\": \"%msg:R,ERE,1,BLANK, 0:FLD6=([0-9]+)--end%\", \"field7\": \"%msg:R,ERE,1,BLANK,0:FLD8=(0x[0-9a-f]+)--end%\"}, \"@tags \": [\"ipt-type1\"]}\n"

$MainMsgQueueSize 1000$MainMsgQueueDiscardMark 800$MaxMessageSize 1000k$WorkDirectory /path/to/disk_assisted_queue_dir/daq$ActionQueueFileName packetlog$ActionQueueMaxDiskSpace 10g$ActionQueueSaveOnShutdown on$ActionQueueType LinkedList$ActionResumeRetryCount -1$SystemLogRateLimitInterval 0 $DefaultNetstreamDriverCAFile /path/to/tls/cert_file/cert.crt$ActionSendStreamDriver gtls$ActionSendStreamDriverMode 1$ActionSendStreamDriverAuthMode x509/name$ActionSendStreamDriverPermittedPeer *.domainname.tld

:msg, contains, "ipt-type1" @@anycast.ip.address:9999;msg_parse & stop

Rsyslog data transmission

15

Page 16: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Rsyslog mutates kernel to JSON and puts onto the wire. Kernel packet log [11507804.647848] ipt-type1: IN=bond0 OUT= MAC=ec:f4:bb:e7:13:a8:78:19:f7:38:0f:f0:08:00 SRC=10.0.0.1 DST=10.1.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=58 ID=28666 DF PROTO=TCP SPT=59999 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x7 Rsyslog JSON output

{ "_source": { "@timestamp": "2015-09-01T01:54:51.813+00:00”, "@srvtype": ”server_type”, "@source_host": ”hostname”, "@fields": { "src": ”10.0.0.1”, "dst": ”10.1.0.1”, ”field1": ”99”, ”field2": ”99”, ”field3": ”99”, ”field4": ”99”, "mark": "0x7” }, "@tags": [ ”ipt_type1” ], "@version": "1”, "type": ”ipt-type1”, "host": ”127.0.0.1” }}

Rsyslog result

16

Page 17: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

There are many alternatives that are more user friendly in production today.

•  Logstash has egress functionality.

•  Lumberjack is a lightweight Logstash.

•  Custom code could be written to transport messages.

•  Publish-subscribe messaging systems could be used (NSQ, Kafka, RabbitMQ, etc.).

•  Many Github projects out there based on Python, Go, or your desired language.

If we were building from the ground up today Rsyslog may not have been selected.

Rsyslog data transmission alternatives

17

Page 18: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Configuration is broken into two parts: input and output.

Logstash is highly configurable and powerful, we use it in it’s most basic form.

Configuration file can be in JSON or YAML. input { tcp { port => 9999 type => ’custom_type_here’ ssl_enable => 'true’ ssl_cacert => ’/path/to/certificate_authority/ca.crt’ ssl_cert => '/path/to/server/cert/server.crt’ ssl_key => '/path/to/server/key/server.key’ codec => json { charset => "UTF-8” } }}

output { if ”ipt-type1" in [@tags] { elasticsearch_http { index => ’index-type1-%{+YYYY.MM.dd}’ host => ’your.es.host.ip.or.localhost’ template => '/path/to/your/elasticsearch/index/template/file.tmpl’ template_name => ’custom_template_name’ template_overwrite => true } }}

Logstash Ingestion

18

Page 19: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Stonefish randomly samples n% of all inbound traffic

•  Each protocol type has data stored in a unique index in Elasticsearch sharded by day.

•  100s of millions of records per protocol type per day are archived.

•  Data set granularity ranges from seconds to months.

Elasticsearch records are immutable.

•  Packet records are never modified

•  Some updates and upserts are done to threat data for detected attacks.

Elasticsearch and Data Storage

19

Page 20: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Elasticsearch configuration and performance

One of the most important pieces to performance

Elasticsearch is a Java application, configuring the Java environment makes an enormous impact on performance.

Original Oracle JVM outperforms distribution specific bundles in all of our testing.

•  A few key environment variables

–  JAVA_OPTS –Xms30g –Xmx30g –  ES_HEAP_SIZE=30g

•  Async replication allows for faster ingestion •  Bulk ingestion can increase storage performance if dealing with large volumes of

data

20

Page 21: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Elasticsearch and security

Later versions of Elasticsearch support authentication and encryption natively

Nginx SSL reverse proxy allows for many authentication schemes and encrypted communications.

•  Nginx can provide discrete URL based authentication schemes

–  Auth tokens for writing data from servers –  LDAP/AD/Kerberos for secure user access

21

Page 22: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 22

Stonefish brings it all together

Page 23: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 23

Visualizing the data

Custom dashboard based on Flask, Angular, and D3

Page 24: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 24

The result – Seeing attacks in real-time

Page 25: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 25

Histograms show anomalies clearly

Normal traffic is consistent

Threats are abrupt changes to traffic pattern

Page 26: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Elasticsearch does the heavy lifting

Elasticsearch facets allow us to query packet data for anomalies.

•  Allows us to write query logic based on what type of data we want to see. We do not have to write ‘fuzzy’ logic or custom code algorithms to determine where anomalies lie.

•  Elasticsearch finds common packet properties based on threshold information supplied in queries to it. •  Elasticsearch’s built in trend data is also used to determine the pattern a

particular signature is taking on. (Trending up/down, etc.)

Building the above functionality in house would have greatly complicated our task and extended our timeframe to deployment.

26

Page 27: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Data Analysis

Elasticsearch data is continually analyzed for changes to packet metrics.

•  This is done via custom Python code.

•  The VDMS software retrieves scores for time intervals and compares them to previous intervals for anomalous or out-of-bounds changes.

•  Each application type has custom query and detection logic for the most accurate identification possible.

Many factors determine if traffic is a threat..

•  Does this traffic exceed a minimum threshold for this POP?

•  Are there signatures that exceed a minimum threshold for this POP?

•  If the answer to either of these questions is yes, generate signatures and score them.

27

Page 28: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Drilling into the data

Elasticsearch facets are used as part of query and filter logic.

•  Aggregations build scores on combinations of packet header properties.

•  Signatures are weighted against one another to determine the percentage of new connections globally and local to a POP.

•  Comparisons to previous data is done to confirm normal/abnormal behavior.

28

Page 29: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Acting on the data

Anomalous traffic patterns detected

•  What is the percentage of total POP and global traffic?

•  Does this traffic exceed a minimum threshold for us to take action?

•  Is this anomalous traffic valid traffic from a customer event in progress? •  This decision is key as large customers can spike heavily.

•  Is the signature specific enough to only filter attack traffic? •  Custom signature scoring algorithm.

•  Mitigation rules are globally distributed within 60 seconds when thresholds are exceeded.

•  The system automatically removes mitigations rules after a pre-determined amount of time once an attack is no longer observed.

29

Page 30: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Signature scoring

Custom algorithm generates a score for a signature.

•  The score gives us an indication of how specific or generic a signature is.

•  Generic signatures can filter valid traffic, the more specific the better and the higher the score.

•  Scores range from 1 – 10.

•  Scores greater than 5 are very specific and safe to deploy.

•  Lower scored signatures require engineering review to determine if they should be deployed.

30

Page 31: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Threat storage

A separate Elasticsearch index is used to store threats and their signatures.

•  Track attack start and stop times

•  Anycast VIP being attacked

•  The protocol(s) used in the attack

•  The signatures associated with the attack

•  Rate of attack traffic

This provides us a historical view of all attacks targeted at out network.

31

Page 32: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 32

Threat Visibility

Page 33: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Elasticsearch allows for historical reporting

Storing threats separately allows for fast and easy reporting.

A single month view in 2015

33

Total 107

Average Per Day 3

Average Attack Size 336,212/sec

Median Attack Size 211,700/sec

Max Attack Size 3,752,166/sec

Average POP Traffic % 22%

Median POP Traffic % 17%

Max POP Traffic % 81%

Average Attack Duration 0:21:08

Median Attack Duration 0:23:00

Max Attack Duration 13:45:00

Total Time Under Attack 52:42:00

Page 34: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Success Stories

The system has been largely successful

•  1000s of attacks have been identified and stopped by Stonefish

•  Minimal performance impact has been experienced as a result of DDOS attacks.

•  No news is good news •  VDMS has not been in the news regarding DDOS or outages. •  Over the past years many networks have been impacted by sizable

sustained attacks.

•  Stonefish identified it’s first attack 30 seconds after going into production.

•  The system has 100% uptime for over 24 months.

•  Stonefish is protecting over 13Tbps of network capacity across 100+ POPs •  5% of the total Internet is powered by VDMS

34

Page 35: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Additional information

Follow up

•  Chris Bradley

•  [email protected]

Is this something you would like to work on?

http://jobs.verizondigitalmedia.com

35

Page 36: Real-time, High Throughput Traffic Analytics with ...€¦ · Real-time, High Throughput Traffic Analytics with Elasticsearch Chris Bradley christopher.bradley@verizondigitalmedia.com

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Thank you.

36