cinbad cern/hp procurve joint project on networking 26 may 2009 ryszard erazm jurga - cern milosz...

16
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Upload: andra-miller

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

CINBAD CERN/HP ProCurve Joint Project

on Networking

26 May 2009

Ryszard Erazm Jurga - CERNMilosz Marian Hulboj - CERN

Page 2: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Outline

Introduction to CERN network

CINBAD project and its goals

Data - sources, collection and analysis

Results and conclusions

2

Page 3: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

3

10-1 40-S2 376-R

VaultComputer Center

887-R 513-C

Vault

874-CCC

513-CCR

513-CCR

Meyrin area

CERN sites

Farms

Internet

230x

40x50x

100x

1000x

or

Minor Starpoints 100x

21x 433x

SL1PL1

BL1 RL1

RL2BL2RL3BL3

BL4

RL4

BL5

RL5

BL7 RL7 BB16

RB16 RB17

BB17

BB20

RB20

BB52 ZB52

AB52 RB52

ZB51BB51

RB51 AB51

ZB50 BB50

AB50 RB50

BB53 ZB53

AB53 RB53

BB54 ZB54

RB54 AB54

BB55 ZB55

AB55RB55

BB56

ZB56

AB56RB56

IB1

IB2

BT15

RT16

TT1 TT2 TT3 TT4 TT5 TT6 TT7

TT8

BT4 RT5 BT5 RT6

BT6 BT7 BT8 BT9 BT10 BT11RT7 RT8 RT9 RT10 RT11 RT12

BT2

RT3

BT1

RT2

BT3

RT4

BT13

RT14

BT12

RT13

Prevessin area

BT16

RT17

PG1

PG2

SG2

SG1

2-S 874-R

Computer Center

LHC area

15x15x

CDR

DAQ

AG

ATCN

90x

AG

AG

15x

AG

Control

CDR

DAQ

HLT

TNTN

AG

AG

15x

25x

10x10x

CDR

DAQ

AG

Control

90x

AG

AG

TN TN Control

AG 8x

10x10x

CDR

DAQ

Monitoring

2175

2275

2375

2475

2575

2675

2775

2875

513 874

874 874212 354 376 513

Simplified overall CERN campus network topology

Numbers:

150+ 10Gb routers

1000 10Gb ports

2’200+ switches

50’000 active user devices

70’000 1Gb user ports

140 Gbps WAN connectivity

4.8 Tbps LCG Core

Page 4: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

CINBAD codename deciphered

CERN Investigation of Network Behaviour and Anomaly Detection

Project Goal

“To understand the behaviour of large computer networks (10’000+ nodes) in High Performance Computing or large Campus installations to be able to: Detect traffic anomalies in the system Be able to perform trend analysis Automatically take counter measures Provide post-mortem analysis facilities “

Page 5: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

What is anomaly? (1)

Anomalies are a fact in computer networks Anomaly definition is very domain specific:

But there is a common denominator: “Anomaly is a deviation of the system from the normal

(expected) behaviour (baseline)” “Normal behaviour (baseline) is not stationary and is not

always easy to define” “Anomalies are not necessarily easy to detect”

5

Network faults Malicious attacks Viruses/worms

Misconfiguration … …

Page 6: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

What is anomaly? (2)

Just a few examples of anomalies:

The network infrastructure misuse• unauthorised DHCP/DNS server (either malicious or

accidental)• network scans• worms and viruses

Violation of a local network/security policy • NAT, TOR usage (not allowed at CERN)

6

Page 7: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

CINBAD project principle

data sources

collectors

storage

analysis

Page 8: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

sFlow – main data source

Based on packet sampling (RFC 3176)

on average 1-out-of-N packet is sampled by an agent and sent to a collector

• packet header and payload included (max 128 bytes)– switching/routing/transport protocol information– application protocol data (e.g. http, dns)

SNMP counters included

low CPU/memory requirements – scalable

For more details, see our technical reporthttp://openlab-mu-internal.web.cern.ch/openlab-mu-internal/Documents/2_Technical_Documents/Technical_Reports/2007/RJ-MM_SamplingReport.pdf

8

Page 9: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

sFlow data usage

Typically: traffic accounting (e.g. for billing, network planning, SLA etc.)

Useful for post-mortem network analysis CERN-wide network tcpdump

• access to the whole CERN sampled network traffic• information where and when the traffic has been seen• particularly useful in detecting packets ‘that should not be

there’ (policy violation)

Rare examples of sflow usage for anomaly detection

9

Page 10: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Other data sources

Packet sampling data is not enough! Data is partial, cannot provide 100% accuracy Not always easy to identify the anomaly

More data to understand flow of data in the network External sources provide useful information and time

triggers Correlation between various data sources

Example: Central Antivirus Service at CERN

10

Page 11: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

CINBAD sFlow data collection

Collector

RedundantCollector

Up to 1Gbps of raw sFlowMulticasted to two collectors

Level I Disk Storage

UnpackedData

Level I Processing Level II Processing

CINBAD DB

Level II Storage

Level III ProcessingData Mining

Configurator

Current collection based on traffic from ~1000 switches ~6000 sampled packets per second~ 3500 snmp counter sets per second

Page 12: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Multistage data storage

Stage 1 sFlow datagram tree-like format unpacked into CINBAD file

format to enable fast direct access Minimal space overhead introduced

Stage 2 Oracle DB as a long-term storage data aggregation

• tradeoff between sFlow randomness and space, data lifetime and anomaly detection requirements

– e.g. number of destination IPs for a given source IP

12

Page 13: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Data analysis

Various approaches are being investigated Statistical analysis methods

• detect a change from “normal network behaviour”– selection of suitable metrics is needed

• can detect new, unknown anomalies• poor anomaly type identification

Signature based• we ported SNORT to work with sampled data• performs well against know problems• tends to have low false positive rate• does not work against unknown anomalies

13

Page 14: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Synergy from both detection techniques

14

Sample-basedSNORT evaluation

engine

Rules

Translation

StatisticalAnalysisengine

TrafficProfiles

AnomalyAlertsIncoming sflow stream

New baselines

New Signatures

Page 15: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Current results

Campus and Internet traffic analysis Identified anomalies which went undetected by

the central CERN IDS

Detected number of misbehaviors• both statistical and pattern matching approaches used• TOR, DNS abuse, Trojans, worms, network scans, p2p

applications, rogue DHCP servers, etc.

Findings reported to the CERN security team Security team adapted their policies

15

Page 16: CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN

Achievements and next steps

It has been demonstrated that pattern matching for anomaly detection is possible with sflow data Payload data is a key advantage of sflow sflow allows distributed detection at the edge of

the network

Combining statistical analysis with pattern matching is providing encouraging initial results

Integration of the two mechanisms holds promise for zero-day detection

16