keynote talk at pdcs – september 16,2004 parallel and distributed computing for cyber security...

34
Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center Department of Computer Science University of Minnesota http://www.cs.umn.edu/~kumar Collaborators: Paul Dokas, Eric Eilertson, Levent Ertoz, Aleksandar Lazarevic, Michael Steinbach, George Simon, Jaideep Srivastava, Pang-Ning Tan, Varun Chandola, Yongdae Kim, Zhi-li Zhang

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Parallel and Distributed Computing

for Cyber Security

Vipin KumarArmy High Performance Computing Research Center

Department of Computer Science University of Minnesota

http://www.cs.umn.edu/~kumar

Collaborators: Paul Dokas, Eric Eilertson, Levent Ertoz,Aleksandar Lazarevic, Michael Steinbach,George Simon, Jaideep Srivastava, Pang-Ning Tan,Varun Chandola, Yongdae Kim, Zhi-li Zhang

Page 2: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Progress in HPC - past 6 decades

ENIACS – 1945

• 100 K Hz

• 5 K Additions/second

• 357 Multiplications/second

Japanese Earth Simulator

IBM Blue Gene

• CPU power increasing by a factor of 50-100 every decade

• Multi-Giga Hz/ Gigabyte PCs are commodity

• Teraflop computers common at large organizations

• Petaflop scale computing within reach in this decade

Virginia Tech Infiniband Cluster

Page 3: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Applications Drive the Technology

“I think there is world market for maybe 5 computers”

- Thomas Watson Sr. (1943)

Scientific Computing Data Driven Computing

Page 4: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Data Mining - A Driver for Parallel/ Distributed Computing

• Lots of data being collected in commercial and scientific world

• Strong competitive pressure to extract and use the information from the data

• Scaling of data mining to large data requires HPC

• Data and/or computational resources needed for analysis are often distributed

• Sometimes the choice is distributed data mining or no data mining– Ownership, privacy, security

issues

INTERNET

network

network network

Page 5: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Cyber Intrusion Detection - Motivation

Spread of SQL Slammer worm 10 minutes

after its deployment

Incidents Reported to Computer Emergency Response

Team/Coordination Center

0

20000

40000

60000

80000

100000

120000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sophistication of cyber attacks and their severity is increasing Large-scale denial of service

attacks Identify Theft/ Fraud Espionage

DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber attacks

Security mechanisms always have inevitable vulnerabilities Firewalls are not sufficient to

ensure security in computer networks

Insider attacks difficult to detect

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Page 6: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Experts Race to Beat Computer WormU.S., Canada Try to Thwart Sobig by Disconnecting 17 Machines BY BRIAN KREBS

Special to The Washington Post________________

Computer-security experts working with law enforcement officials in the United States and Canada raced yesterday to contain the Sobig.F computer worm before it could launch a new attack as authorities reported progress on finding the source of the virus.

Security experts who cracked the worm's code late Thursday night found that Sobig instructed infected computers to try to contact one of 20 other computers yesterday afternoon to download new instructions -- to do what is as yet unknown. But the worm either failed to seek those instructions or it was thwarted from doing so when security experts disconnected 17 of the 20 targeted computers before the anticipated 3 p.m. attack.

The computer worm was one of at least three viruses that have brought corporate, personal and government computer networks to a crawl over the past two weeks.

The FBI served a grand jury subpoena yesterday on EasyNews.com, a Phoenix-based Internet service provider whose network may have been used as a starting point for the Sobig worm.

The worm is thought to have been released originally on Usenet, a sort of Internet bulletin board, by someone who had an account at EasyNews.com, according to Michael Minor, the company's co-owner. The account was paid for with a stolen credit card number and established minutes before the virus was released on the Internet on Monday, Minor said. He added that the company is cooperating with the FBI.

The virus was disguised on Usenet as a pornographic photograph in an adult news group, Minor said. People who clicked on the photo had their PC infected with the virus, which then began to e-mail itself to every address on the infected computer's e-mail address book.

FBI cyber division spokesman Bill Murray said the bureau and the Department of Homeland Security would do everything they could, including serving subpoenas, to track the source of the worm.

The Sobig.F worm, a variation of a virus that's been around since January, quickly spread out of control this month. America Online Inc., the world's largest online service, reported that nearly 60 percent of the 38 million attachments to e-mail messages that it filtered Thursday contained the Sobig.F virus.

This weekend, a denial-of-service attack took down the Web site of The SCO Group, which is caught in an increasingly acrimonious row with the open-source community over the company's legal campaign against Linux.

SCO's Web site was largely out of commission until Monday morning, a representative of the Lindon, Utah-based Unix and Linux seller said Monday. Performance measurement statistics from Netcraft indicated that the site had been down since Friday night.

SCO claims that IBM illegally inserted Unix code into its version of Linux and has sent letters to corporations, warning them that they may be violating copyright laws by using the Linux operating system.

Raymond, president of the Open Source Initiative advocacy group, urged the hacker, if a member of the open-source community, to stop the attack, because it could do more harm than good.

Hackers cut off SCO Web siteMartin LaMonica, Staff Writer, August 25, 2003

It has been an anxious few weeks for computer users trying to ward off various worms and viruses, which can take over their laptops and desktop machines and even bring businesses and public agencies to a halt. Malicious computer programs have been around for years, with hundreds of viruses apt to be circulating at any one time, but the attacks this month have been particularly trying. Yesterday experts barely headed off a programmed acceleration of a viral attack that has already flooded the Internet with hundreds of millions of infected e-mail messages.

The latest round of computer woes started earlier this month when the Blaster worm took advantage of a weakness in the Windows operating system to invade computer hard drives, where it slowed operations and moved on to attack other computers as well. This irritation was promptly compounded by a similar worm, called Nachi or Welchia, that seems to have been designed to follow in the footsteps of Blaster.

Just as these problems seemed to be receding, the latest version of a virus called SoBig invaded many computers in the form of attachments to e-mail notes. When an unwary user opened such an attachment, the virus could steal all the e-mail addresses residing on the computer and mail copies of itself to those people as well, inviting the unwary to click on the attachment and start the infection moving again.

Virtually everyone involved in the Internet can share some of the blame for these lapses in security. Microsoft, whose Windows operating systems were the target of the worms, can be faulted for failing to design and test its software adequately. Internet service providers have not stressed security as much as ease of connection. Government and corporate network administrators have sometimes been slow to safeguard their systems. And individuals have been notoriously lax about keeping their antivirus protection up to date. Indeed, Microsoft offered a free software patch to protect against the Blaster worm in July, yet many users never bothered to download it.

An Onslaught of Computer Viruses

August 23, 2003

Blackout spurs cyberattack worryBy Kevin Maney and Michelle Kessler, USA TODAY, August 19, 2003The electric power grid might be more vulnerable to a cyberattack today than it was on Sept. 11, 2001.

Officials doubt last week's massive blackout was caused by a terrorist or domestic hacker breaking into an electric power system via the Internet. Yet, the incident again brought to the forefront concerns that such an attack is possible.

"This power infrastructure is all Band-Aids and baling wire. And, of course, it's all dependent on computers," says Peter Neumann of research firm SRI International. "This stuff is riddled with security and reliability flaws."

BY ANITHA REDDYWashington Post Staff Writer_________________

The Navy has canceled 13,000 credit cards used for government expenses after discovering that hackers had downloaded card numbers and billing records, Defense Department officials said.

often you'll find that the administrative networks are segmented from the core of the Department of Defense and that maybe they don't provide as much as security as some of the core networks."

Citibank finished mailing new cards Wednesday to replace the 13,000 that were compromised, said Glenn

Hackers Steal 13,000 Credit Card NumbersNavy Says No Fraud Has Been Noticed

Saturday, August 23, 2003

Saturday, August 23, 2003

Page 7: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

What are Intrusions?

Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by:– Attackers accessing the system

from Internet– Insider attackers - authorized users

attempting to gain and misuse non-authorized privileges

Typical intrusion scenario

Scanning

activity

Computer

Network

Attacker

Machine with

vulnerability

Page 8: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

What are Intrusions?

Intrusions are actions that attempt to bypass security mechanisms of computer systems. They are caused by:– Attackers accessing the system

from Internet– Insider attackers - authorized

users attempting to gain and misuse non-authorized privileges

Typical intrusion scenario Computer

Network

Attacker

Compromised Machine

Scanning

activity

Page 9: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Intrusion Detection Systems

www.snort.org

Example of SNORT rule

(MS-SQL “Slammer” worm)

any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send")

Intrusion Detection System – Combination of software and hardware that attempts to

perform intrusion detection– Raises the alarm when possible intrusion happens

• Traditional intrusion detection system IDS tools are based on signatures of known attacks

Limitations– Signature database has to be manually revised

for each new type of discovered intrusion– Substantial latency in deployment of newly created signatures

across the computer system – Cannot detect emerging cyber threats– Not suitable for detecting policy violations and insider abuse– Do not provide understanding of network traffic– Generate too many false alarms– Not suited for detecting multi-step attacks

Page 10: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Data Mining for Intrusion Detection

• Increased interest in data mining based intrusion detection over the past decade

• Misuse detection

– Suitable for attacks for which it is difficult to build signatures

– Builds predictive models from labeled labeled data sets (instances are labeled as “normal” or “intrusive”) to identify known intrusions

– Cannot detect unknown and emerging attacks– Madam ID project, ADAM project, fuzzy association rules [Bridges00], decision trees

[Sinclair99], neural networks [Lippmann00, Ghosh99], genetic algorithms [Bridges00, Sinclair99], cost sensitive modeling (AdaCost [Fan99], MetaCost [Domingos99, Ting00]), learning from rare class ([Kubat97, Fawcett97, Provost01, Japkowicz01, Joshi02, Lazarevic03]

• Anomaly detection

– Detects emerging/novel attacks as deviations from “normal” behavior

– Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies

– PHAD, ALAD [Chan01, Cha02], ADAM [Barbara01] finite mixture model [Yamanishi00], 2 based [Ye01]), temporal sequence learning [Lane98], neural networks [Ryan98], generating artificial anomalies [Fan01], clustering [Eskin02], unsupervised SVM [Eskin02, Lazarevic03], outlier detection schemes (MINDS), Bayesian net [Valdes00], Hidden Markov models [Ourston03]

Page 11: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Data Mining for Intrusion Detection

Misuse Detection – Building Predictive Models

categoric

al

tem

poral

continuous

class

Model

Learn

Classifier

Tid SrcIP Start time

Dest IP Dest Port

Number of bytes

Attack

1 206.135.38.95 11:07:20 160.94.179.223 139 192 No

2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10

Tid SrcIP Start time

Dest Port Number of bytes

Attack

1 206.163.37.81 11:17:51 160.94.179.208 150 ?

2 206.163.37.99 11:18:10 160.94.179.235 208 ?

3 206.163.37.55 11:34:35 160.94.179.221 195 ?

4 206.163.37.37 11:41:37 160.94.179.253 199 ?

5 206.163.37.41 11:55:19 160.94.179.244 181 ?

categoric

al

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Summarization of attacks using association rules

Training Set

Key Technical Challenges

Large data size

High dimensionality

Temporal nature of the data

Skewed class distribution

Data preprocessing

On-line analysis

Clustering & Anomaly Detection

P3

P

Nd

Hubs Authorities

Link Analysis

Live data

Page 12: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Misuse Detection – Building Predictive Models

Key Technical Challenges

Large data size

High dimensionality

Temporal nature of the data

Skewed class distribution

Data preprocessing

On-line analysis

categoric

al

tem

poral

continuous

class

Model

Learn

Classifier

Tid SrcIP Start time

Dest IP Dest Port

Number of bytes

Attack

1 206.135.38.95 11:07:20 160.94.179.223 139 192 No

2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10

Tid SrcIP Start time

Dest Port Number of bytes

Attack

1 206.163.37.81 11:17:51 160.94.179.208 150 ?

2 206.163.37.99 11:18:10 160.94.179.235 208 ?

3 206.163.37.55 11:34:35 160.94.179.221 195 ?

4 206.163.37.37 11:41:37 160.94.179.253 199 ?

5 206.163.37.41 11:55:19 160.94.179.244 181 ?

categoric

al

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Summarization of attacks using association rules

Training Set

P3

P

Nd

Hubs Authorities

Link Analysis

Live data

Clustering & Anomaly Detection

Data Mining for Intrusion Detection

Page 13: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

MINDS – Minnesota INtrusion Detection System

network

Data capturing device

Anomaly detection

……

Anomaly scores

Humananalyst

Detected novel attacks

Summary and characterization

of attacks

MINDS system

Known attack detection

Detected known attacks

Labels

Feature Extraction

Association pattern analysis

Filtering

Net flow tools

tcpdump

Data mining based anomaly detection system Used at the University of Minnesota to analyze network traffic to/from 40,000

computers Incorporated into Interrogator architecture at ARL Center for Intrusion Monitoring and

Protection (CIMP), PoC: Bencevenko and Long (ARL) Helps analyze data from multiple sensors at DoD sites around the country

Routinely detects attacks and intrusive behavior not detected by widely used intrusion detection systems

Insider Abuse / Policy Violations / Worms / Scans

ARL-CIMP considers MINDS as the first effective anomaly intrusion detection system

Page 14: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

MINDS

score srcIP sPort dstIP dPort protocolflagspackets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1637674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 026676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 024323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 021169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 017679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 08183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.58 0 0 0 0 07142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 05139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 04048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 04008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 03327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0

Anomalous connections that correspond to the “slammer” worm Anomalous connections that correspond to the ping scan Connections corresponding to UM machines connecting to “half-life” game servers

Typical Anomaly Detection OutputMINDS

Page 15: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Summarization Using Association Patterns

Anomaly Detection System

attack

normal

R1: TCP, DstPort=1863 Attack

R100: TCP, DstPort=80 Normal

Discriminating Association

Pattern Generator

1. Build normal profile

2. Study changes in normal behavior

3. Create attack summary

4. Detect misuse behavior

5. Understand nature of the attack

update

Knowledge Base

Ranked connections

MINDS

Page 16: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Typical MINDS OutputMINDS

score c1 c2 src IP sPort dst IP dPort protocolflags packetsbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

31.2 - - 218.19.X.168 5002 134.84.X.129 4182 6 27 [5,6) [0,2045) 0 0.01 0.01 0.03 0 0 0 0 0 0 0 0 0 0 1 0

3.04 138 12 64.156.X.74 ----- xxx.xxx.xxx.xxx----- xxx 4 [0,2) [0,2045) 0.12 0.48 0.26 0.58 0 0 0 0 0.07 0.27 0 0 0 0 0 0

15.4 - - 218.19.X.168 5002 134.84.X.129 4896 6 27 [5,6) [0,2045) 0.01 0.01 0.01 0.06 0 0 0 0 0 0 0 0 0 0 1 0

14.4 - - 134.84.X.129 4770 218.19.X.168 5002 6 27 [5,6) [0,2045) 0.01 0.01 0.05 0.01 0 0 0 0 0 0 1 0 0 0 0 0

7.81 - - 134.84.X.129 3890 218.19.X.168 5002 6 27 [5,6) [0,2045) 0.01 0.02 0.09 0.02 0 0 0 0 0 0 1 0 0 0 0 0

3.09 4 1 xxx.xxx.xxx.xxx4729 xxx.xxx.xxx.xxx----- 6 ------ --------- --------- 0.14 0.33 0.17 0.47 0 0 0 0 0 0 0.2 0 0 0 0 0

2.41 64 8 xxx.xxx.xxx.xxx----- 200.75.X.2 ----- xxx ------ --------- [0,2045) 0.33 0.27 0.21 0.49 0 0 0 0 0 0 0 0 0.28 0.25 0.01 0

6.64 - - 218.19.X.168 5002 134.84.X.129 3676 6 27 [5,6) [0,2045) 0.03 0.03 0.03 0.15 0 0 0 0 0 0 0 0 0 0 0.99 0

5.6 - - 218.19.X.168 5002 134.84.X.129 4626 6 27 [5,6) [0,2045) 0.03 0.03 0.03 0.17 0 0 0 0 0 0 0 0 0 0 0.98 0

2.7 12 0 xxx.xxx.xxx.xxx----- xxx.xxx.xxx.xxx113 6 2 [0,2) [0,2045) 0.25 0.09 0.15 0.15 0 0 0 0 0 0 0.08 0 0.79 0.15 0.01 0

4.39 - - 218.19.X.168 5002 134.84.X.129 4571 6 27 [5,6) [0,2045) 0.04 0.05 0.05 0.26 0 0 0 0 0 0 0 0 0 0 0.96 0

4.34 - - 218.19.X.168 5002 134.84.X.129 4572 6 27 [5,6) [0,2045) 0.04 0.05 0.05 0.23 0 0 0 0 0 0 0 0 0 0 0.97 0

4.07 8 0 160.94.X.114 51827 64.8.X.60 119 6 24 [483,-) [8424,-) 0.09 0.26 0.16 0.24 0 0 0 0.91 0 0 0 0 0 0 0 0

3.49 - - 218.19.X.168 5002 134.84.X.129 4525 6 27 [5,6) [0,2045) 0.06 0.06 0.06 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3.48 - - 218.19.X.168 5002 134.84.X.129 4524 6 27 [5,6) [0,2045) 0.06 0.06 0.07 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3.34 - - 218.19.X.168 5002 134.84.X.129 4159 6 27 [5,6) [0,2045) 0.06 0.07 0.07 0.37 0 0 0 0 0 0 0 0 0 0 0.92 0

2.46 51 0 200.75.X.2 ----- xxx.xxx.xxx.xxx21 6 2 --------- [0,2045) 0.19 0.64 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

2.37 42 5 xxx.xxx.xxx.xxx21 200.75.X.2 ----- 6 20 --------- [0,2045) 0.35 0.31 0.22 0.57 0 0 0 0 0 0 0 0 0.18 0.28 0.01 0

2.45 58 0 200.75.X.2 ----- xxx.xxx.xxx.xxx21 6 ------ --------- [0,2045) 0.19 0.63 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

UMN computer connecting to a remote FTP server, running on port 5002 Summarized TCP reset packets received from 64.156.X.74, which is a victim of

DoS attack, and we were observing backscatter, i.e. replies to spoofed packets Summarization of FTP scan from a computer in Columbia, 200.75.X.2 Summary of IDENT lookups, where a remote computer tries to get user name Summarization of a USENET server transferring a large amount of data

Page 17: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

score c1 c2 src IP sPort dst IP dPort protocolflags packets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

57973 - - 128.101.X.1 56025 192.67.X.205 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

6530 - - 141.213.X.100 4354 160.94.X.142 59999 tcp ---AP-SF [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

3227 - - 192.67.X.206 43710 128.101.X.1 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

1534 - - 160.94.X.142 59999 141.213.X.100 4354 tcp ---A--SF [32k,1M][3M,8M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

19.3 9 67 193.62.X.38 ----- 160.94.X.132 ----- tcp ---A--SF --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

14.9 23 81 134.84.X.117 ----- xxx.xxx.xxx.xxx----- tcp ---AP--- --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

26.6 81 258 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.1 0 0.1 0 0.1 0 0 0

88.2 5 1 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] [200,200] 0 0.1 0 0.1 0 0 0 0 0 0 0 0 0 0 1 0

143 - - 160.94.X.132 35755 193.62.X.38 45288 tcp ---A---F [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

117 - - 144.34.X.164 1676 128.101.X.190 22 tcp ---A---- [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

13.4 4 31 128.101.X.204----- xxx.xxx.xxx.xxx----- tcp ---A---F --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.2 0.1 0 0.1 0 0 0 0 0

12.3 11 101 xxx.xxx.xxx.xxx----- 134.84.X.117 ----- tcp ---AP--- --------- --------- 0.3 0.2 0.5 0.3 0 0 0 0.1 0.1 0.1 0.1 0 0.1 0 0 0

58.9 - - 134.84.X.2 554 67.40.X.170 62727 tcp ---AP-S- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

54 - - 128.101.X.39 54906 65.221.X.2 50789 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

34.4 - - 62.70.X.101 17534 134.84.X.43 6881 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

28.4 - - 220.120.X.249 15074 160.94.X.1 2355 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

12.1 23 73 xxx.xxx.xxx.xxx 57 216.196.X.78 ----- tcp ---A-R-- --------- --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.2 0 0 0 0.2 0 0 0

UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes

or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from

computers that do not have 57/TCP open.

MINDS

Typical MINDS Output

Page 18: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Typical Summarization OutputMINDS

score c1 c2 src IP sPort dst IP dPort protocolflags packets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

57973 - - 128.101.X.1 56025 192.67.X.205 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

6530 - - 141.213.X.100 4354 160.94.X.142 59999 tcp ---AP-SF [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

3227 - - 192.67.X.206 43710 128.101.X.1 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

1534 - - 160.94.X.142 59999 141.213.X.100 4354 tcp ---A--SF [32k,1M][3M,8M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

19.3 9 67 193.62.X.38 ----- 160.94.X.132 ----- tcp ---A--SF --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

14.9 23 81 134.84.X.117 ----- xxx.xxx.xxx.xxx----- tcp ---AP--- --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

26.6 81 258 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.1 0 0.1 0 0.1 0 0 0

88.2 5 1 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] [200,200] 0 0.1 0 0.1 0 0 0 0 0 0 0 0 0 0 1 0

143 - - 160.94.X.132 35755 193.62.X.38 45288 tcp ---A---F [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

117 - - 144.34.X.164 1676 128.101.X.190 22 tcp ---A---- [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

13.4 4 31 128.101.X.204----- xxx.xxx.xxx.xxx----- tcp ---A---F --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.2 0.1 0 0.1 0 0 0 0 0

12.3 11 101 xxx.xxx.xxx.xxx----- 134.84.X.117 ----- tcp ---AP--- --------- --------- 0.3 0.2 0.5 0.3 0 0 0 0.1 0.1 0.1 0.1 0 0.1 0 0 0

58.9 - - 134.84.X.2 554 67.40.X.170 62727 tcp ---AP-S- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

54 - - 128.101.X.39 54906 65.221.X.2 50789 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

34.4 - - 62.70.X.101 17534 134.84.X.43 6881 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

28.4 - - 220.120.X.249 15074 160.94.X.1 2355 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

12.1 23 73 xxx.xxx.xxx.xxx 57 216.196.X.78 ----- tcp ---A-R-- --------- --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.2 0 0 0 0.2 0 0 0

UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are

192bytes or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from

computers that do not have 57/TCP open.

Page 19: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Detecting Modes of Network Traffic Using Clustering

• Used Shared Nearest Neighbor (SNN) clustering – Not distracted by “noise” in the data– CPU intensive: O(N2)– Requires storing an N x K matrix

• K (number of neighbors) is typically between 10 – 20• K should be about the size of the smallest expect mode

• Clustered 850,000 connections collected over one hour at one US Army Fort

• Took 10 hours on a 16 CPU cluster• Found 3135 clusters

– Largest clusters around 500 records, smallest cluster 10 records– Large clusters correspond to normal behavior– Many small clusters correspond to policy violations or other undesired

behavior

Page 20: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Detecting Modes of Network Traffic Using Clustering

Large clusters of VPN traffic (hundreds of connections)

Used between forts for secure sharing of data and working remotelyStart Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Packets Bytes20040407.10:00:00.428036 0:00:00 A -1 B -1 gre 237 1 55620040407.10:00:00.685520 0:00:03 A -1 B -1 gre 237 1 55620040407.10:00:00.748920 0:00:00 A -1 B -1 gre 237 1 55620040407.10:01:44.138057 0:00:00 A -1 B -1 gre 237 1 55620040407.10:01:59.267932 0:00:00 A -1 B -1 gre 237 1 9620040407.10:02:44.937575 0:00:01 A -1 B -1 gre 237 1 55620040407.10:04:00.717395 0:00:00 A -1 B -1 gre 237 1 55620040407.10:04:30.976627 0:00:01 A -1 B -1 gre 237 1 55620040407.10:04:46.106233 0:00:00 A -1 B -1 gre 237 1 55620040407.10:05:46.715539 0:00:00 A -1 B -1 gre 237 1 55620040407.10:06:16.975202 0:00:01 A -1 B -1 gre 237 1 55620040407.10:06:32.105013 0:00:00 A -1 B -1 gre 237 1 556

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL packets Bytes20040407.10:00:40.685522 0:00:03 B -1 A -1 gre 237 1 9620040407.10:00:58.748922 0:00:00 B -1 A -1 gre 237 1 9620040407.10:01:44.138059 0:00:00 B -1 A -1 gre 237 1 9620040407.10:02:14.678442 0:00:00 B -1 A -1 gre 237 1 9620040407.10:02:44.937577 0:00:01 B -1 A -1 gre 237 1 9620040407.10:03:15.308206 0:00:00 B -1 A -1 gre 237 1 9620040407.10:04:30.976629 0:00:01 B -1 A -1 gre 237 1 9620040407.10:06:16.975204 0:00:01 B -1 A -1 gre 237 1 9620040407.10:06:32.105015 0:00:00 B -1 A -1 gre 237 1 9620040407.10:06:47.234837 0:00:00 B -1 A -1 gre 237 1 9620040407.10:07:02.367471 0:00:00 B -1 A -1 gre 237 1 9620040407.10:07:17.494574 0:00:00 B -1 A -1 gre 237 1 96

Page 21: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Clusters Involving GoToMyPC.com (Army Data)

Policy violation, allows remote control of a desktopStart Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags Packets Bytes20040407.10:00:10.428036 0:00:00 A 4125 B 8200 tcp 123 ***AP*SF 5 24820040407.10:00:40.685520 0:00:03 A 4127 B 8200 tcp 123 ***AP*SF 5 24820040407.10:00:58.748920 0:00:00 A 4138 B 8200 tcp 123 ***AP*SF 5 24820040407.10:01:44.138057 0:00:00 A 4141 B 8200 tcp 123 ***AP*SF 5 24820040407.10:01:59.267932 0:00:00 A 4143 B 8200 tcp 123 ***AP*SF 5 24820040407.10:02:44.937575 0:00:01 A 4149 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:00.717395 0:00:00 A 4163 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:30.976627 0:00:01 A 4172 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:46.106233 0:00:00 A 4173 B 8200 tcp 123 ***AP*SF 5 24820040407.10:05:46.715539 0:00:00 A 4178 B 8200 tcp 123 ***AP*SF 5 24820040407.10:06:16.975202 0:00:01 A 4180 B 8200 tcp 123 ***AP*SF 5 24820040407.10:06:32.105013 0:00:00 A 4181 B 8200 tcp 123 ***AP*SF 5 248

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:00:40.685522 0:00:03 B 8200 A 4127 tcp 123 ***AP*SF 4 21120040407.10:00:58.748922 0:00:00 B 8200 A 4138 tcp 123 ***AP*SF 4 21120040407.10:01:44.138059 0:00:00 B 8200 A 4141 tcp 123 ***AP*SF 4 21120040407.10:02:14.678442 0:00:00 B 8200 A 4145 tcp 123 ***AP*SF 4 21120040407.10:02:44.937577 0:00:01 B 8200 A 4149 tcp 123 ***AP*SF 4 21120040407.10:03:15.308206 0:00:00 B 8200 A 4153 tcp 123 ***AP*SF 4 21120040407.10:04:30.976629 0:00:01 B 8200 A 4172 tcp 123 ***AP*SF 4 21120040407.10:06:16.975204 0:00:01 B 8200 A 4180 tcp 123 ***AP*SF 4 21120040407.10:06:32.105015 0:00:00 B 8200 A 4181 tcp 123 ***AP*SF 4 21120040407.10:06:47.234837 0:00:00 B 8200 A 4182 tcp 123 ***AP*SF 4 21120040407.10:07:02.367471 0:00:00 B 8200 A 4183 tcp 123 ***AP*SF 4 21120040407.10:07:17.494574 0:00:00 B 8200 A 4184 tcp 123 ***AP*SF 4 211

Detecting Modes of Network Traffic Using Clustering

Page 22: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Clusters involving mysterious ping and SNMP traffic

Misconfigured computer subjected to SNMP surveillanceStart Time Duration Src IP Src Port Dst IP Dst Port Proto TTL ICMP Type ICMP Code # Packets # Bytes20040407.10:01:00.181261 0:00:00 A 1176 B 161 udp 123 1 9520040407.10:01:23.183183 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:02:54.182861 0:00:00 A 1514 B 161 udp 123 1 9520040407.10:03:03.196850 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:04:45.179841 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:06:27.180037 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:09:48.420365 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:11:04.420353 0:00:00 A 3013 B 161 udp 123 1 9520040407.10:11:30.420766 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:12:47.421054 0:00:00 A 3329 B 161 udp 123 1 9520040407.10:13:12.423653 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:14:53.420635 0:00:00 A -1 B -1 icmp 123 8 0 1 84

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL ICMP Type ICMP Code # Packets # Bytes20040407.10:01:00.181488 0:00:00 B 161 A 1176 udp 63 1 10320040407.10:01:23.183291 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:01:55.180590 0:00:00 B 161 A 1326 udp 63 1 23420040407.10:02:54.184537 0:00:00 B 161 A 1514 udp 63 1 13420040407.10:03:03.196958 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:04:45.179965 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:05:09.180542 0:00:00 B 161 A 1927 udp 63 1 23420040407.10:06:27.180159 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:09:48.420410 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:11:30.420773 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:13:12.423663 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:14:53.421019 0:00:00 B -1 A -1 icmp 254 0 0 1 84

Detecting Modes of Network Traffic Using Clustering

Page 23: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Clusters involving unusual repeated ftp sessions

Further investigations revealed a misconfigured Army computer was trying to contact Microsoft

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:10:57.097108 0:00:00 A 3004 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:27.113230 0:00:00 A 3007 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:37.111176 0:00:00 A 3008 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:57.118231 0:00:00 A 3011 B 21 tcp 123 ***AP*SF 7 31820040407.10:12:17.125220 0:00:00 A 3013 B 21 tcp 123 ***AP*SF 7 31820040407.10:12:37.132428 0:00:00 A 3015 B 21 tcp 123 ***AP*SF 7 31820040407.10:13:17.146391 0:00:00 A 3020 B 21 tcp 123 ***AP*SF 7 31820040407.10:13:37.153713 0:00:00 A 3022 B 21 tcp 123 ***AP*SF 7 31820040407.10:14:47.178228 0:00:00 A 3031 B 21 tcp 123 ***AP*SF 7 31820040407.10:15:47.199100 0:00:00 A 3040 B 21 tcp 123 ***AP*SF 7 31820040407.10:16:07.206450 0:00:00 A 3042 B 21 tcp 123 ***AP*SF 7 318

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:00:06.627895 0:00:01 B 21 A 2924 tcp 123 ***AP*SF 7 44920040407.10:00:16.633872 0:00:01 B 21 A 2925 tcp 123 ***AP*SF 7 44920040407.10:00:36.638794 0:00:01 B 21 A 2927 tcp 123 ***AP*SF 7 44920040407.10:01:16.652664 0:00:01 B 21 A 2932 tcp 123 ***AP*SF 7 44920040407.10:01:26.659694 0:00:01 B 21 A 2933 tcp 123 ***AP*SF 7 44920040407.10:01:56.666816 0:00:01 B 21 A 2937 tcp 123 ***AP*SF 7 44920040407.10:02:06.670680 0:00:01 B 21 A 2938 tcp 123 ***AP*SF 7 44920040407.10:02:56.687932 0:00:01 B 21 A 2944 tcp 123 ***AP*SF 7 44920040407.10:03:26.698413 0:00:01 B 21 A 2947 tcp 123 ***AP*SF 7 44920040407.10:04:06.712495 0:00:01 B 21 A 2952 tcp 123 ***AP*SF 7 44920040407.10:05:06.733731 0:00:01 B 21 A 2961 tcp 123 ***AP*SF 7 44920040407.10:06:16.758442 0:00:01 B 21 A 2969 tcp 123 ***AP*SF 7 449

Detecting Modes of Network Traffic Using Clustering

Page 24: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Need for HPC

• Very large data size– Typical network traffic at University level reach

around 500 million connections per day

• Compute intensive nature of the pattern finding algorithm– Associative analysis– Clustering– Sequential pattern analysis

Page 25: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Need for Distributed Intrusion Detection

• Attacks on the network infrastructure may be launched from several different locations and may target multiple destinations

• Stealthy coordinated attacks with low traffic volumes are difficult to detect by IDSs based at a single network site

• Detection of such attacks in early stage requires correlation of data at multiple network sites

MINDS M

INDS

MINDS

MINDS

MINDS

Page 26: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Map of the Global IP Space

Page 27: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Source IPs of suspicious connections in the global IP

space

Suspicious Traffic on Port 80

Destination IPs of suspicious connections within the 3 class

B networks at the U of M

999 unique sources, 1126 unique destinations, 1516 total flows involved

+ Failed connections O Successful connections

Page 28: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Page 29: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Source IPs of suspicious connections in the global IP

space

Suspicious Traffic on Port 445

Destination IPs of suspicious connections within the 3 class

B networks at the U of M

7982 unique sources, 6184 unique destinations, 9930 total flows involved

+ Failed connections O Successful connections

Page 30: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Page 31: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

• Centralizing data is not possible– Data needed for analysis is distributed– Costs of centralizing data is too high – Security and privacy issues

• Computational resources needed for analysis are distributed

Need for Grid-based IDS

INTERNET

network

network network

How to detect a distributed network attack?

Page 32: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Data Services(e.g. JDBC,

SQL, SRB)

Grid and Web Services(e.g. Globus,

XML-RPC, DWTP,Condor)

Grid and Web Services(e.g. Globus, XML-RPC,

DWTP, Condor)

Data Services(e.g. JDBC, SQL, SRB)

Data and PolicyManagement Services

Data and PolicyManagement Services

Scheduling and ReplicationServices

Data Mining and

Exploration Services

Execution, RepresentationAnd Management Systems

(e.g., Chimera, Pegasus)

Application

Data & Model Transport Services

Grid Control Services

Data Mining Middleware for Grids

NSF/ITR funded project jointly with B. Grossman, S. Ranka, and J. Weissman

Page 33: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Grid-Based Data Mining: Distributed Network Intrusion Detection

Data Services Grid and Web Services

Data and PolicyManagement Services

Scheduling and ReplicationServices

Data Mining and Exploration Services

Execution, RepresentationAnd Management Systems

Application

Grid and Web Services Data Services

Data and PolicyManagement Services

Data & Model Transport Services

Grid Control Services

Detection of attack by correlating suspicious events across sites.

Locate computing resources needed for time critical execution of the data mining query.

Needed to protect privacy, but allow necessary data access.

INTERNET

network

network network

Page 34: Keynote talk at PDCS – September 16,2004 Parallel and Distributed Computing for Cyber Security Vipin Kumar Army High Performance Computing Research Center

Keynote talk at PDCS – September 16,2004

Publications

• Managing Cyber Threats: Issues, Approaches and Challenges, edited by V. Kumar, J. Srivastava, and A. Lazarevic, Kluwer Academic Publishers (forthcoming).

• MINDS - Minnesota Intrusion Detection System, Ertöz, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., Dokas, P., Data Mining: Next Generation Challenges and Future Directions, editors: H. Kargupta, A. Joshi, K. Sivakumar, Y. Yesha MIT/AAAI Press, 2004, AHPCRC Technical Report # 2003-121

• Detection of Novel Network Attacks Using Data Mining, L. Ertöz, E. Eilertson, A. Lazarevic, P. Tan, P. Dokas, V. Kumar, J. Srivastava, Workshop on Data Mining for Computer Security, IEEE International Conference on Data Mining, Melbourne, FL, November 19, 2003, AHPCRC Technical Report # 2003-108

Visit http://www.cs.umn.edu/~kumar for further information