network intrusion detection dean final, actual version

Post on 21-Feb-2017

17 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Network Intrusion Detection

By: Jack Song, Julina Zhang, Kerry JonesAdvisors: Dr. Don Brown, Dr. Hyojung Kang, Dr.

Malathi VeeraraghavanClient: UVA Information Security, Policy, and

Records Office (ISPRO)Sponsors: UVA SEAS/ Leidos

1

Agenda

● Team Members● Project Objectives● Progress to Date● Deliverables● Potential Sponsors

2

Team Members - Data Science Institute

3

Jack Song● Majored in

Computer Science at UVA

Julina Zhang● Majored in

Statistics and Economics at UVA

Kerry Jones● Majored in

Government and Geography at UMD

Team Members - Advisors

4

Dr. Donald E. Brown● Director of the Data

Science Institute● Dept. of Systems and

Information Engineering

Dr. Malathi Veeraraghavan● Dept. of Electrical &

Computer Engineering

Dr. Hyojung Kang ● Dept. of Systems and

Information Engineering

Team Members

Jason Belford● Chief Information Security

Officer 5

Jeff Collyer● Information Security Engineer

Team Members

6

Sourav Maji● Third-year PhD student in Computer

Engineering

Ron Hutchins● Vice President for Information

Technology

Objectives

● To detect anomalous traffic leaving UVA network using machine learning and data mining.

● Develop a network intrusion detection prototype.

7

Agenda

● Team Members● Project Objectives● Progress to Date● Deliverables● Potential Sponsors

8

Background - Approaches

● Lancope StealthWatch● Previous approaches

○ Density-based Spatial Clustering of Applications with Noise (Erman, Arlitt, Mahanti)

○ K-Means Clustering (Erman, Arlitt, Mahanti) ○ One-class Support Vector Machine (Locke, Wang,

Paschalidis)○ Neural Network (Locke, Wang, Paschalidis)○ Hierarchical Clustering (Ling, Rosti, Swanson)○ Isolation Forest( Liu, Ting, Zhou)

● Our approach○ Isolation Forest - An unsupervised learning method

that utilizes a tree structure to isolate anomalies. 9

Our progress, in a glance

10

- ISPRO- Preprocessing- Wireshark- Filtering

- Unsupervised methods- Isolation Forest

- Didn’t work out well

- Collection server- Power Edge

- TShark- Conversation data- Better ‘Unit’- Preliminary results

Course of Time

Prog

ress

Initial Data

Filtered Data

netFlow data

Initial data phase

Data from ISPRO

+ Data preprocessing

+ Data filtering by source IPs within UVA network

Result: a subset of packet capture data of all conns initiated within the UVA network

11

Init, Data Preprocessing

12

ISPRO data 1 TB

WIRESHARK/TShark

50GB → 5GB.pcap → .csv

One pcap file

50GB/6min

Summary statistics;AlgorithmsPython Script

Filtered data phase

Result from last phase

Created source - destination IP pairs

Calculated frequency and mean length for each pair

+ Isolation Forest

Provided an initial view, but more is needed.

13

Filtered data phase, what we’ve learned

Packet capture data ONLY captures packets

+ Need to capture the entire use session

Need netFlow records data

14

NetFlow data phase -- Now

● Setting up a collection server

○ Power Edge

● Conversation data & TShark

● Better ‘Unit of comparison’

○ include port number

● Preliminary analysis15

16

17

Count 157,313

Unique Source IP 11514

Unique Destination IP 13113

Unique Destination Ports 1631

Unique Source Ports 48925

Average Duration 31 Secs

Average Packets Source to Destination 34 Packets

Average Packets Destination to Source 31 Packets

Average Bytes Source to Destination 10172 Bytes

Average Bytes Destination to Source 58134 Bytes

Summary Statistics

Top Five Most Frequently used Destination Ports

18

Destination Port

Count Number of Unique Source IP pairs

80 ( HTTP) 66390 11238

443 (HTTPS) 38422 954

25 (FTP) 24277 39

6 20387 1

3 957 2

19

NetFlow data phase, next steps

● Finish setting up Power Edge○ Shell script ○ Cron job

■ Automation of daily data collection● Go into specifics, “symptoms”

○ DNS tunneling○ Phishing

20

Identified Cyber Security Needs

● Identifying anomalous behavior in traffic leaving the UVa network

○ Source data: NetFlow records

○ Traffic from hosts with static public IP addresses

● DNS Tunneling

○ Data theft using port 53 as a pathway

● Phishing Attack

○ Obtain sensitive information by disguising and baiting.

21

Challenges

1. Domain knowledge2. Size of data

a. 36 min of data, approx. 270 GB3. IP addresses

a. Dynamic vs. Staticb. Private vs. Public

4. Unlabeled data → unsupervised learning

22

Deliverables

● Paper● Network intrusion detection prototype● Shell script

23

Potential Sponsors

● NSF Cybersecurity Innovation for Cyberinfrastructure (CICI)

● NSF Secure and Trustworthy Cyberspace (SaTC) programs

● DHS CyberSecurity Division programs

● DOE Cybersecurity for Energy program

● Industry, specifically NTT Labs and Cisco

24

References

1. Ashfaq, Rana Aamir Raza, et al. "Fuzziness Based Semi-Supervised Learning Approach for Intrusion Detection System." Information Sciences (2016).

2. Boutaba, Carol Fung and Raouf. Intrusion Detection Networks. CRC Press, 2013.3. —. Intrusion Detection Networks: A Key to Distributed Security. CRC Press, 2013.4. Erman, Jeffrey, Martin Arlitt, and Anirban Mahanti. "Traffic Classification using Clustering Algorithms." Proceedings

of the 2006 SIGCOMM workshop on Mining network data. ACM, 2006. 281-286.5. Farnham, Greg. “Detecting DNS Tunneling”. SANS Institute InfoSec Reading Room. 2013 6. Grimes, Robert. Detect network anomalies with StealthWatch. 2014. IDG. 2016.

<http://www.infoworld.com/article/2848768/security/detect-network-anomalies-with-stealthwatch.html>.7. Locke, R., J. Wang, and I. Paschalidis. "Anomaly Detection Techniques for Data Exfiltration Attempts.." Boston

University Center for Information and Systems Engineering, 2012.8. Sommer, Robin, and Vern Paxson. "Outside the Closed World: On using Machine Learning for Network Intrusion

Detection." 2010 IEEE symposium on security and privacy (2010).9. Yuning Ling, Marcus Rosti, Gregory Swanson. "A Hands-off Approach to Network Intrustion Detection." IEEE

Systems and Information Engineering Design Conference (SIEDS). Charlottesville : IEEE, 2016. 216-220.10. Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. “Isolation-based anomaly detection.” ACM Transactions on Knowledge

Discovery from Data (TKDD) 6.1 (2012): 3.

25

Isolation Forest

• Unsupervised learning method• Builds an ensemble of ITrees

for a given data set.• The anomalies are those

observations with shortest average length path root node.

26

Preliminary Results of iForest

27

top related