scalable netflow analysis with hadoop€¦ · • packetpig (2012.03) - big data security analytics...
TRANSCRIPT
![Page 1: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/1.jpg)
Scalable NetFlow Analysis with Hadoop
Yeonhee Lee and Youngseok Lee {yhlee06, lee}@cnu.ac.kr
http://networks.cnu.ac.kr/~yhlee Chungnam National University, Korea
January 8, 2013
FloCon 2013
![Page 2: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/2.jpg)
Contents
• Introduction • Overview • Hadoop-based traffic processing tool • Evaluation • Summary
![Page 3: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/3.jpg)
INTRODUCTION
![Page 4: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/4.jpg)
Internet Measurement
• Challenges • Scalability • Fault-tolerant system • Extensibility
• CAIDA data • Capture, Curation, Storage, Search, Sharing, Analysis,
and Visualization • Ark topology: 1.8 TB • Telescope: 102 TB • Packet headers: 18.8 TB
4
Josh Polterock, “CAIDA: A Data Sharing Case Study,” Security at the Cyber Border: Exploring Cybersecurity for International Research Network Connections workshop, 2012
![Page 5: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/5.jpg)
• 1 PB sorting by Google • 2008: 6 hours and 2
minutes on 4,000 computers
• 2011: 33 minutes on 8000 computers
• 2011: 10PB, 8000 computers, 6 hours and 27 minutes
Harness Distributed Computing and Storage ? Google MapReduce, 2004 Apache Hadoop project
5
![Page 6: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/6.jpg)
Our Proposal
6
NetFlow v5
Packet
Administrator
Slave
Master
Traffic Collector
Web Visualizer / Hive
Pcap I/O
NetFlow I/O
Traffic Analysis Mapper & Reducer
Traffic Analyzer
HDFS Hadoop
Bin I/O
Hadoop-based Traffic Measurement and Analysis Platform
1. Yeonhee Lee and Youngseok Lee, "Toward Scalable Internet Traffic Measurement and Analysis with Hadoop," ACM SIGCOMM Computer Communication Review (CCR), Jan. 2013
2. Yeonhee Lee and Youngseok Lee “A Hadoop-based Packet Trace Processing Tool” , TMA, April 2011 3. Yeonhee Lee and Youngseok Lee, "Detecting DDoS Attacks with Hadoop", ACM CoNEXT Student
Workshop, Dec, 2011
![Page 7: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/7.jpg)
Related Work
• Traffic analysis of DNS root server (RIPE, 2011.11) • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World
2011 • Firewall/IDS logs, netflow/packet
• Performing Network and Security Analytics with Hadoop, (Travis Dawson, Narus), Hadoop Summit 2012
• Distributed Bro (IDS)
7
![Page 8: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/8.jpg)
OVERVIEW
![Page 9: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/9.jpg)
Hadoop-based NetFlow Analysis
9
Distributer
NetFlow NetFlow
Collect & Anlaysis
![Page 10: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/10.jpg)
10
Distributer
Packet N
etFlow
monitor
query
Traffic Analyzer
MapReduce for Traffic Analysis
IO formats
Traffic Collector
& Loader
Pcap InputFormat
Binary Input/OutputFormat
Data Source (Jpcap, HDFS)
Data Processing (HDFS, MapReduce, Hive)
User Interface (Hive, Web)
Text Input/OutputFormat
IP analysis MR
TCP analysis MR
HTTP analysis MR
User Interface
Web UI
CLI
DDoS analysis MR
Hive QL Query for Traffic Analysis
NetFlow analysis MR
Hadoop HDFS
Scan IP query
Spoofed IP query
Heavy User query
User-defined query
![Page 11: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/11.jpg)
HADOOP-BASED TRAFFIC ANALYSIS
![Page 12: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/12.jpg)
Challenges
1. Data handing issue in HDFS 2. Distributed traffic analysis MapReduce algorithms 3. Performance tuning in a large-scale Hadoop
Distributed computation
Fault tolerance
Scalability (~TB/PB)
12
![Page 13: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/13.jpg)
1. Data handing issue in Hadoop
2. Distributed traffic analysis MapReduce algorithms
3. Performance tuning in a large-scale Hadoop
testbed
13
Challenges
![Page 14: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/14.jpg)
file-level processing
Block-level Parallelism
14 14 00 00 68 2B AD 4C 38 A4 04 00 5C 00 00 00 5C 00 00 00 FF FF ‥‥ 00 21 B5 01 68 2B AD 4C 2B 1C 07 00 3C 00 00 00 3C 00 00 00 01 80 ‥
HDFS Block2 (64 MB)
HDFS Block3 (64 MB)
block-level processing
![Page 15: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/15.jpg)
15
Block-level IO vs. File-level IO
3.5 3.9
4.3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0
20
40
60
80
100
120
140
Spee
dUp
Com
plet
ion
Tim
e (m
in)
# of nodes
IP Analysis_blockIO
IP Analysis_fileIO
SpeedUp vs fileIO
![Page 16: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/16.jpg)
1. Data handing issue in Hadoop
2. Distributed traffic analysis MapReduce algorithms
3. Performance tuning in a large-scale Hadoop
testbed
16
Challenges
![Page 17: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/17.jpg)
Reduce Phase
Map Phase
Block IO HDFS HDFS
DistributedCache Aggregation
17
Filtering Rule cnu;srcip=168.188.0.0-168.188.255.255
Aggregation Rule as;ip;subnet;port;protocol;srcas;dstas;srcip;dstip;srcsubnet;dstsubnet;srcport;dstport;
IP/UDP packet NetFlow v5 header
v5 record
IP/UDP packet NetFlow v5 header
v5 record
v5 record
…
…
deco
ding
filte
ring
gro
up-k
ey
gene
ratio
n
aggr
egat
ion
deco
ding
filte
ring
gro
up-k
ey
gene
ratio
n
aggr
egat
ion
Port # of octets # of packets # of Flows
Protocol # of octets # of packets # of Flows
AS # of octets # of packets # of Flows
Subnet # of octets # of packets # of Flows
pack
et
iden
tific
atio
n pa
cket
id
entif
icat
ion
K: time|AS V: count
K: time|AS V: count
counts per AS
counts per AS
![Page 18: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/18.jpg)
Map Phase
Reduce Phase
Shuffle &Sort
Block IO HDFS
HDFS
Anomaly Detection
18
DistributedCache Detection Rules port_scan;ip,proto=6;srcip,dstport;srcip;pkts=20- syn_flood;ip,proto=6,syn-fin=1-;srcip,dstip;srcip,dstip;syn-fin=6-
IP/UDP packet NetFlow v5 header
v5 record
IP/UDP packet NetFlow v5 header
v5 record
v5 record
…
…
deco
ding
patte
rn
mat
chin
g g
roup
-key
ge
nera
tion
aggr
egat
ion
deco
ding
patte
rn
mat
chin
g g
roup
-key
ge
nera
tion
aggr
egat
ion
syn_flood 3.3.3.33.3.3.4 …
PortScan 1.1.1.11.1.1.2 …
dete
ctio
n de
tect
ion
parti
tioni
ng
&gro
up s
ort
parti
tioni
ng
&gro
up s
ort
pack
et
iden
tific
atio
n pa
cket
id
entif
icat
ion
![Page 19: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/19.jpg)
1. Data handing issue in Hadoop
2. Distributed traffic analysis MapReduce algorithms
3. Performance tuning in a large-scale Hadoop
19
Challenges
![Page 20: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/20.jpg)
• Configuration • Hadoop IO Buffer (128K 1 MB) • Java heap space (300 MB 1 024 MB) • # of MapReduce Slots (# of cores)
• MapReduce Algorithm • normal combiner vs inMapper combiner
• Job scheduling
20
Performance Tuning
![Page 21: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/21.jpg)
Job Scheduling
• Different job types • Periodic jobs (for monitoring)
• guaranteed service within time • e.g Aggregated Statistics for monitoring, Flow Parse job for
analytics
• Small ad-hoc query job (for analytics) • fast response time
21
Collect Collect Collect … Collector
Basic Statistics Flow Parse Basic Statistics Flow Parse … Basic Statistics Flow Parse Fair Scheduler ad-hoc query ad-hoc query ad-hoc query
Basic Statistics Flow Parse Basic Statistics Flow
Parse … Basic Statistics Flow
Parse FIFO Scheduling ad-hoc query
ad-hoc query
ad-hoc query
5 munites 5 munites 5 munites 5 munites
ad-hoc job periodic job
![Page 22: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/22.jpg)
PERFORMANCE EVALUATION
![Page 23: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/23.jpg)
Experiments
• Testbed
• Data and MapReduce jobs
23
Type Nodes Cores CPU Memory HardDisk Rack Small 3 24 3.4 GHz 8 core 16 GB 2 TB 1 Rack
Medium 30 240 2.93 GHz 8 core 16 GB 4 TB 1 Rack
Large 200 400 2.66 GHz 2 core 2 GB 500 GB 4 Racks
Type Dataset MapReduce Job Testbed
NetFlow 1 TB from KOREN flowStats, flowDetect, flowPrint Small
Packet 1 ~ 5 TB from CNU campus N/W IP, TCP, Web (webpop, User Behavior, DDoS) Medium, Large
![Page 24: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/24.jpg)
5.0
2.3
0.01.02.03.04.05.06.0
1 2 3
Spee
dUp
# of nodes
flowStats vs. Flowtools
flowPrint vs. FlowTools
24
NetFlow: SpeedUp (vs. Flowtools)
> FlowPrint flow-cat -p flowfile |flow-print –f14
> FlowStats flow-cat -p flowfile|flow-stat -f12 flow-cat -p flowfile|flow-stat –f5
![Page 25: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/25.jpg)
25
NetFlow: Scalability
103
17
0
20
40
60
80
100
120
5 10 15 20 25 30
Job
Com
plet
ion
Tim
e (m
in)
# of nodes
flowStats
flowDetect
6.1
01234567
5 10 15 20 25 30Sp
eedU
p (v
s 5
node
s)
# of nodes
flowStats
flowDetect
![Page 26: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/26.jpg)
1
10
100
1000
10000
100000
8/29/2012 9/8/2012 9/18/2012 9/28/2012 10/8/2012 10/18/2012 10/28/2012
coun
t
date
worm.sasser,w32.sasser
remote_administrator
vnc
w32.witty.worm
worm.opasoft,w32.opaserv.worm
code_red_worm
netfairy
kamun
emule
shockwave_killer
worm.killmsblast,w32.nachi.worm,w32.welchia.worm
26
NetFlow: Pattern Matching Result
0
2
4
6
8
10
# of
reco
rds (
M) NetFlows Record Distribution
![Page 27: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/27.jpg)
9
13
02468
101214
5 10 15 20 25 30
Thro
ughp
ut (G
bps)
# of nodes
IP Analysis (Gbps)
TCP Analysis (Gbps)
WebPop (Gbps)
UserBehavior (Gbps)
DDos (Gbps)
121
15 13
77
020406080
100120140
5 10 15 20 25 30
Com
plet
ion
Tim
e (m
in)
IP Analysis (min)
TCP Analysis (min)
WebPop (min)
UserBehavior (min)
DDoS (min)
Packet: ScaleOut
27
![Page 28: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/28.jpg)
28
Packet: SizeUp (30 nodes)
2934
7007
0
2
4
6
8
10
12
14
16
0
1000
2000
3000
4000
5000
6000
7000
8000
1TB 2TB 3TB 4TB 5TB
Thro
ughp
ut (G
bps)
Com
plet
ion
Tim
e (s
ec)
Data Size
IP Analysis
IP Analysis_ripe
TCPStats
Webpop
UserBehavior
DDoS
IP Analysis
TCPStats
Webpop
UserBehavior
DDoS
![Page 29: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/29.jpg)
2744
8
15
0
2
4
6
8
10
12
14
16
18
0500
100015002000250030003500400045005000
1TB 2TB 3TB 4TB 5TB
Thro
ughp
ut (G
bps)
Com
plet
ion
Tim
e (s
ec)
Data Size
IP AnalysisTCP AnalysisWebpopUserBehaviorDDoSIP AnalysisTCP AnalysisWebpopUserBehaviorDDoS
Packet: SizeUp (200 nodes)
29
![Page 30: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/30.jpg)
SUMMARY
![Page 31: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/31.jpg)
• NetFlow analysis with Hadoop • NetFlow v5 processing module • MapReduce algorithms: statistics
• Distributed computing and storage with Hadoop
• Fits Internet measurement application • Scalability
• Source codes are available at
• Packet, NetFlow • https://sites.google.com/a/networks.cnu.ac.kr/dnlab/researc
h/hadoop • https://github.com/ssallys/pcap-on-Hadoop
31
Summary
![Page 32: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/32.jpg)
Ongoing Work • Distributed real-time
monitoring • Rule matching for
Streamed NetFlow • Developing rule for
MapReduce • Rule classification for
dedicated rule matching
• Integration • Streaming packages • Enhanced analytics
• Data mining: Mahout • Machine learning
32
RHive
Hive Pig
MapReduce
HDFS
Mahout
RHadoop
Rhipe
Performance
Productivity
• Scalable collection • E.g.) 10GE 10 X 1 GE
HDFS
![Page 33: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/33.jpg)
• Papers 1. Y. Lee and Y. Lee, "Toward Scalable Internet Traffic
Measurement and Analysis with Hadoop," ACM SIGCOMM Computer Communication Review (CCR), Jan. 2013
2. Y. Lee, W. Kang, and Y. Lee, "A Hadoop-based Packet Trace Processing Tool," The Third TMA, April 2011
3. Y. Lee and Y. Lee, "Detecting DDoS Attacks with Hadoop", ACM CoNEXT Student Workshop, Dec, 2011
• Software 1. http://networks.cnu.ac.kr/~yhlee 2. https://sites.google.com/a/networks.cnu.ac.kr/dnlab/research/hadoop 3. https://github.com/ssallys/pcap-on-Hadoop
33
Reference
![Page 34: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/34.jpg)
THANK YOU !
![Page 35: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/35.jpg)
35
0%
20%
40%
60%
80%
100%
no 10 20 30 40 50 60 70 80 90 100
110
120
130
140
150
160
170
180
190
200
Usa
ge(
%)
Tics
IP Analysis
mem_usage
disk_read_usage
disk_write_usage
cpu_usage
net_in_usage
net_out_usage
0%
20%
40%
60%
80%
100%
1 19 37 55 73 91 109
127
145
163
181
199
217
235
253
271
289
307
325
343
361
379
397
Usa
ge(
%)
Tics
TCP Analysis
mem_usage
disk_read_usage
disk_write_usage
cpu_usage
net_in_usage
net_out_usage
![Page 36: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/36.jpg)
36
Collector TaskTracker Datanode
Collector TaskTracker Datanode
Collector TaskTracker Datanode
LoadBalancer JobTracker NameNode
PF_RING
PF_RING ,RAID
DBMS
![Page 37: Scalable NetFlow Analysis with Hadoop€¦ · • PacketPig (2012.03) - Big Data Security Analytics platform • Sherpasurfing – Open Source Cyber Security Solution, Hadoop World](https://reader033.vdocument.in/reader033/viewer/2022052611/5f03e2d67e708231d40b4074/html5/thumbnails/37.jpg)
37
…
…
…
…
…
on - the - fly packets packets NetFlow record
each statistics
aggregated statistics
table records
visualized statistics
load balancer
network interface
PF_RING /RSS
Input Format
Mapper Reducer table records
visualized statistics
Internet
rule name; filter pattern; mapout key; patition&groupsort key;detection condition; action ex) port_scan;ip,proto=6;srcip,dstport;srcip;pkts=20- syn_flood;ip,proto=6,syn-fin=1-;srcip,dstip;srcip,dstip;syn-fin=6-