supercomputing center measurement and performance analysis of supercomputing traffic by flowscan+...

20
Supercomputing Center Measurement and Performance Analysi s of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August 28, 2003

Upload: ahmad-bennet

Post on 15-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

Measurement and Performance Analysis ofSupercomputing Traffic by FlowScan+ 2.0

Supercomputing Center of KISTI

Kookhan KimAugust 28, 2003

Page 2: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

2

Contents

• Introduction• FlowScan• FlowScan+ 2.0• Traffic Measurement & Analysis• Others

Page 3: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

3

Introduction

• We have various types supercomputers– NEC, IBM, Compaq, PC cluster

• Supercomputing traffics– All traffics to calculate many kinds of data, which is generated bet

ween supercomputers and every users• Users would have authenticated and authorized ID

• Until now, we did’t try to measure supercomputing traffic and analyze them yet

• We want to know the characteristics of supercomputing traffics– who use it?– what applications & protocols used?– how much amount traffic generated?

• To meet these demands, we improved FlowScan

Page 4: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

4

What is FlowScan?

• FlowScan is a passive measurement tool drawing traffic graphs by analyzing network flows exported by routers and switches– NetFlow is exported CISCO routers and switches

• It was developed by Dave Plonka and managed by CAIDA (http://www.caida.org)

• Main modules - Perl scripts – cflowd (a flow collection engine)– flowscan (central process in the system)

• Our improvement focuses on this module– RRDtool (a visualization tool)

• Definition : Flow– An IP flow is a unidirectional series of IP packets of a

given protocol, travelling between a source and destination, within a certain period of time.

Page 5: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

5

Enhanced FlowScan+

• The goal– Make a good passive measurement tool

• The Motivations– Lack of traffic measurement tool that supports real time visu

alization and detailed traffic analysese on demand– To make user friendly tool, it can help everyone easy to use

• Why FlowScan?– An open source program– It has good graphing function on the web– But yet it does not support query interface

• Who is involved?– Supercomputing Center of KISTI – System Architecture Lab., Dept. of Computer Science, KAIST

Page 6: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

6

Flowscan

Flow-ToolsRRD

Staticgraph

DB Aggregation(15 min)

Dynamicgraph

LinkQuery

NetFlow v7 FlowScan Original Module

Analysis Module (FlowScan+ 1.0)

VisualizationModule

(FlowScan+ 2.0)

ParsedData

FlowScan+ 2.0

Page 7: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

7

FlowScan+ Main Point

• FlowScan+ 1.0– Use MySQL

• Store NetFlow Information into DB

– Rawflows– Aggregated data

– Query interface• Access to the DB• By Web• Easy to use

• FlowScan+ 2.0– Flow-tools

• NetFlow version problem

– User Group Edit• Small group, large group • Divided by IP Class

– Visualization of DB query result

• JAVA Servlet, jfreechart

Page 8: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

8

FlowScan+ 2.0 : NetFlow Versions

NetFlow Version

Comments

1 Original

5 Standard and most common

7 Specific to Cisco Catalyst 6500 and 7600 Series Switches Similar to Version 5, but does not include AS, interface, TCP Flag & TOS information

8 Choice of eleven aggregation schemesReduces resource usage

9 Flexible, extensible file export format to enable easier support of additional fields & technologies; coming out now MPLS, Multicast, & BGP Next Hop

Page 9: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

9

FlowScan+ 2.0 : Flow-tools

• NetFlow v5 & v7 have different PDU formats and do not correspond with including information

• Cflowd, main NetFlow collection module in the FlowScan, cannot collect NetFlow v7

• We have to change NetFlow capture module

• Flow-tools replace cflowd as NetFlow v7 collection modules

NetFlow v5 NetFlow v7

FLOW index:          0xc7ffff router:         134.75.20.70 src IP:         128.253.253.59 dst IP:         210.98.25.11 input ifIndex:  60 output ifIndex: 14 src port:       445 dst port:       2979 pkts:           6 bytes:          744 IP nexthop:     134.75.20.3 start time:     Thu May 15 15:10:47 2003 end time:       Thu May 15 15:10:51 2003 protocol:       6 tos:            0x0 src AS:         17579 dst AS:         17579 src masklen:    16 dst masklen:    19 TCP flags: 0x1b (PUSH|SYN|FIN|ACK) engine type:    1 engine id:      10 

FLOW index:          0xc7ffff router:         150.183.5.251 src IP:         150.183.5.194 dst IP:         150.183.138.216 input ifIndex:  0 output ifIndex: 0 src port:       80 dst port:       3215 pkts:           6 bytes:          497 IP nexthop:     0.0.0.0 start time:     Mon May 12 18:41:34 2003 end time:       Mon May 12 18:41:34 2003 protocol:       6 tos:            0x0 src AS:         0 dst AS:         0 src masklen:    0 dst masklen:    0 TCP flags:      0x0 engine type:    0 engine id:      0 

Page 10: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

10

FlowScan+ 2.0 : User Grouping

• There is no way to veryfy user(id) of the Supercomputer– The user-related information is only IP address in the NetFlow– By this information, we can consider that “who is generating traffic

user” • If users always connect the supercomputer with same syst

em, they have the same source/dest IP : it is no problem• But they can log in with other systems in the same office o

r same building– So we takes a user grouping concept– If completely different place log in, it is impossible

analysis user(id) from NetFlow• Except from this siuation, we can verify supercomputing user with netw

ork IP of NetFlow

Page 11: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

11

FlowScan+ 2.0 : User Grouping

Group name group numberGroup ID user ID or related informationWe have classified only C class IP

- If one has many user ids - When we compare the traffic of a number of institutes with each others- We should aggregate its total traffics- Large grouping

Page 12: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

12

FlowScan+ 2.0 : Visualization

• In FlowScan+, improved by adding MySQL, has free DBMS based on the query interface to get flow information

• But results of query are text based information – difficulties to intuitive und

erstand– It cannot display result plo

t as time sereis• To support this, FlowScan+

2.0 takes a visualization servlet

Page 13: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

13

FlowScan+ 2.0 : Visualization

Visualization process & graph

- The text result is only way that we can see the result of query interface until now- If we want to see the result of graphical plot as time passed- FlowScan+ 2.0 makes one more query into DB

Page 14: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

14

Traffic Measurement topology

Ruby-8/80Catayst6506

BaramTigerKordicKfddi2Lion

Cisco7513Cisco7513C6506

SUPER COMPUTERS

H-NFSH-NFS

SiSi SiSi

SiSi

H-Opal H-Ruby

IBM

NEC

COMPAQ

FlowScan+ 2.0

PC Cluster

C6506

Ruby-8/80Catayst6506

NetFlow v7 export

• Our supercomputer is linked mesh type with 2 catalyst 6500 series switches

• NetFlow v7 export• Drawing graph every 5m

in.• Storing aggregated data

& rawflows into BD every 15min.

Page 15: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

15

Top user (by Institute)

Institute Bytes (MB) % 

KMA 48,135 47.22%

Seoul National Univ.

11,433 11.22%

KISTI 10,319 10.12%

Air Force 9,609 9.43%

KAIST 3,713 3.64%

Yonsei Univ. 2,912 2.86%

ETS soft 1,063 1.04%

Kyunhee Univ. 451 0.44%

Choongnam Univ. 416 0.41%

Pusan National Univ. 415 0.41%

FlowScan+ 2.0 – traffic analysis

(2003/July/21 14:00 ~ /28 14:00)

- 1 week measurement traffic- It is analyzed by large group- The pie graph draws again by the Excel sheets

Page 16: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

16

Application

Service Bytes (MB) %

http 547,902 47.34%

ftp 491,691 42.48%

unknown 115,319 9.96%

telnet 2,216 0.19%

domain 273 0.02%

FlowScan+ 2.0 – traffic analysis

(2003/July/21 14:00 ~ /28 14:00)

• It shows a strange result, we cannot expect• We want to know the cooupied portion by various applications

– Involved in bio, physics, aerospace, chemistry and so on.• But those are operated in the supercomputer

– Those applications are installed in the supercomputers– Users log in the supercomputer by telnet and ftp– Transfer theirs data & Operate application from remote sites

Page 17: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

17

Other usage of FlowScan+ 2.0

• Detection of Network abnormalities– Port scanning– Cord Red virus– NIMDA virus

• Mass mailing worm component– DDoS attack

• Some features between flow and traffic amount

• Byte : normal size traffic• Flow : explosive increase

• Detection of emerging new applications– GRID applications, P2P applications and so

on – If we should match new emerge application

with defined its port number• Decrease unknown traffic portion

Page 18: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

18

FlowScan+ of KISTI

Page 19: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

19

Conclusions

• FlowScan+ developed by KISTI & KAIST• Characteristics of FlowScan+ 2.0

– Flow-tools• NetFlow version problem.

– Group edit• It can be measure & analysis of traffics by each users

– Visualization of results• It makes graphical plot as time serise.

• Future Works– DB optimization to speed up– Installation packaging– More stability of flowscan– Aggregate merits of each versions

Page 20: Supercomputing Center Measurement and Performance Analysis of Supercomputing Traffic by FlowScan+ 2.0 Supercomputing Center of KISTI Kookhan Kim August

Supercomputing Center

20

Thank you for your attention

Questions ?