4 full chapter margin.pdf

59
1 CHAPTER I: INTRODUCTION 1.1 Overview This project is about a network forensic that allow finding the details of networking events after they happened and how to analyze VoIP attacked data pattern by using WEKA, a data mining tool. WEKA is used to view network traffic, in order to investigate network and security attacks or application performance issues. From the data pattern, an investigation will be conducted to reveal information about network and application interactions, user sessions, and response time and latency metrics. It is also to get the information about the source of the attacks, when the attacks happen, where the source of the attacks comes from and what type of attacks that are found and track down a hacker is to keep vast records of activity on a network with the help of an intrusion detection system. From the gathered data, it will help to find a solution for each attack to prevent them from happening again in the future. From the data analysis, it also reveals who communicated with whom, when, and how often. This information gained could be used as evidences to the victims for them to take further action on the parties that committed network crimes on them. 1.2 Project Objective The main objective of this project is to analyze the pattern of attack data from the captured data. In which case, the data will indicate the condition of the network events. Hence the source of attacks or other problem incidents will be discovered. It helps in identifying unauthorized access to a computer system, and searches for evidence of other types of threats of attack occurrence.

Upload: cendolpuluit

Post on 11-Nov-2015

235 views

Category:

Documents


1 download

TRANSCRIPT

  • 1

    CHAPTER I: INTRODUCTION

    1.1 Overview

    This project is about a network forensic that allow finding the details of

    networking events after they happened and how to analyze VoIP attacked data

    pattern by using WEKA, a data mining tool. WEKA is used to view network

    traffic, in order to investigate network and security attacks or application

    performance issues. From the data pattern, an investigation will be conducted to

    reveal information about network and application interactions, user sessions, and

    response time and latency metrics. It is also to get the information about the

    source of the attacks, when the attacks happen, where the source of the attacks

    comes from and what type of attacks that are found and track down a hacker is to

    keep vast records of activity on a network with the help of an intrusion detection

    system.

    From the gathered data, it will help to find a solution for each attack to

    prevent them from happening again in the future. From the data analysis, it also

    reveals who communicated with whom, when, and how often. This information

    gained could be used as evidences to the victims for them to take further action

    on the parties that committed network crimes on them.

    1.2 Project Objective

    The main objective of this project is to analyze the pattern of attack data

    from the captured data. In which case, the data will indicate the condition of the

    network events. Hence the source of attacks or other problem incidents will be

    discovered. It helps in identifying unauthorized access to a computer system, and

    searches for evidence of other types of threats of attack occurrence.

  • 2

    The second objective is to convert the pcap data to arff data file that will

    recognize by the WEKA data mining tool. The first objective cannot be

    conducted if the second objective is failing to apply.

    1.3 Project Scope

    This project will focus on VoIP and attacked data pattern by using

    WEKA, a data mining tool. The Denial-of-service attack (DoS), Spam over

    Internet Telephony (SPIT), and Man-in-the-middle (Mitm) attacks are the three

    main focuses of this project.

    1.4 Problem Statement

    The growth in networking connectivity, complexity and activity has

    increased the number of crimes committed within networks. An emerging

    application like VoIP has worsened the situation. Knowing the attacked patterns

    allows network administrators to fence their network.

    VoIP is one of the newest technologies that are being rapidly embraced

    by the market as an alternative to the traditional Public Switched Telephony

    Network (PSTN). The common VoIP threats are network-based DoS,

    eavesdropping, signaling protocols, spam and etc. These attacks can make

    conversations unintelligible due to malicious people that can listen in others

    conversations, network overloaded, and packet loss or network congests that

    caused a network down. In addition the bandwidth for each application on the

    network will be less since they will be shared amongst the applications.

  • 3

    1.5 Problem Solving

    The solution to the problem can be solved by any network tools. WEKA

    which is a data mining tool will be used in this project to view network traffic

    history to investigate the attacked and identify the source of attacks.

  • 4

    1.6 Chapter Organization

    This chapter contains the detailed description of the project proposed

    which is VoIP data Forensic using WEKA a data mining tool. In this chapter, we

    have described the surface of how the VoIP data Forensic work and how the

    attacks had given an impact on VoIP application. More details about VoIP data

    Forensic using WEKA a data mining tool will be described in Chapter 4.

    Chapter 2 discusses the literature review that is used in the project. The

    literature review describes all the research and findings that related to this

    project.

    Chapter 3 will discuss on the research methodology that will give specific

    research methods used to design the project. In this chapter, there had

    explanations on the methods and specifications that used in this project and also

    prepare budgets and costing.

    Chapter 4 will discuss on the testing and implementation of the

    project. This chapter will give the explanation on how the project will be

    implemented.

    Chapter 5 will discuss on the project verification. This chapter will give a

    result from the project implementations or experiments. From this chapter, user

    will understand on how the system running and the final output of the system.

    Chapter 6 is the conclusion of the project. Any other suggestions or

    enhancements will be listed in this chapter for future reference.

  • 5

    CHAPTER II: LITERATURE REVIEW

    This chapter consists of discussion on several subjects that related to this project.

    The reviews start with a definition and concept of VoIP, Data Mining and Network

    Forensic. In addition, the existing VoIP protocol and VoIP issues will be one of the

    researches. Then some work by other researchers that related to the area of study will be

    study so that it can be included in a literature review.

    2.1 Background

    2.1.1 Voice over Internet Protocol (VoIP)

    VoIP known as IP Telephony, which is using an Internet Protocol over an

    IP network. With the growth in popularity and bandwidth, VoIP allows phone

    calls to be routed over the Internet rather than Public Switched Telephone

    Network (PSTN). VoIP converts the voice signal into digital signals that travel

    over the Internet. The voice signal is packetized and sent over the network one-

    by-one. The processes of packetization involved with a callers voice signal

    being compressed, then transfer it over the IP network, and it is then

    decompressed at the end [1]. So VoIP can achieve on any data network that uses

    IP like Local Area Networks (LAN), Internets and Intranets [1].

    There are several reasons why VoIP telephony is becoming very

    attractive to telecommunication providers and users rather than PSTN. The

    decreased call cost is one of the main reasons. It is relatively cheap to make a

    long distance call through a VoIP service rather than PSTN. This is because

    network resources such as bandwidth, router CPU and memory are shared

    between applications in the Internet [2]. When using a PSTN line, users had to

    pay for each minute that spend on the phone. The Internet is a backbone of VoIP,

  • 6

    the cost that the user has to pay is a monthly bill to an Internet service provider

    (ISP). The other reason is VoIP services can be used for conference calls as

    appose to the phone line whereby only two persons can speak at a time. With

    VoIP, a conference can be setup with a whole team, communicating in a real

    time.

    Figure 2.1 shows a simple VoIP process. To send data over the internet,

    the voices or the data are compressed into small packets to reduce amount of

    transmission space. These packets are sent in different order and the packets are

    then streamed line at the other end. Generally packet loss can happen during the

    transmission. To recover from the loss, there is a mechanism in order to cover up

    the loss and building up the data by collecting the pieces of information [3].

    There are also other potential problems with VoIP such as increased

    security risks and lower Quality of Service (QoS) and Denial of Service (DOS)

    [4]. In the PSTN, a circuit or dedicated channel was set up between two points

    for the call duration. These telephony systems are based on copper wires carrying

    analog voice data over the dedicated circuits [5]. A set amount of bandwidth is

    Figure 2.1: VoIP Processing [3]

  • 7

    reserved when a call is established between the callers for the time the

    connection is active. One of the main problems with PSTN technology is that

    the 64 kbps of bandwidth is reserved even when there is no data being sent and

    the entire bandwidth is not needed. The actual requirement for bandwidth is

    usually only a small amount of what is reserved [4].

    VoIP telephony relies upon methods and various protocols to establish

    calls and transmit data. Most VoIP implementations however use Session

    Initiation Protocol (SIP) and Real-time Transport Protocol (RTP). The SIP

    protocol is a text based application is used for a call teardown, call initiation, and

    other call related data sent during the conversation [4]. Besides using SIP for

    teardown and initiating calls, SIP is also used for integrating more users into a

    conference call. VoIP that uses SIP relies on a SIP proxy server which to

    authenticate the users login credentials. This proxy also used for signaling the

    data and to route the call and acts as a registrar which is used to locate other

    users [4].

    RTP is used for generic transport capabilities for real-time multimedia

    applications that support both steaming applications and conversational such as

    video conferencing, video-on-demand, internet telephony, internet radio and

    music-on-demand. RTP is transported with a Datagram Protocol (UDP) packet to

    reduce overhead to get a greater transmission speed or a better call quality.

    2.1.2 VoIP Attacks

    VoIP is for sure gaining advantage over PSTN but there is a major concern for

    the VoIP community which is its security. An increasing security mechanism

    would have a poor VoIP performance service. On the other hand, without

    security mechanisms, VoIP services would be open to threats and attacks [2].

    Man-in-The-Middle (MiTM), Denial of Service (DoS) and Spam over Internet

    Telephony (SPIT) are among the VoIP attacks.

  • 8

    MiTM attack is the attacker inserts himself between two communicating

    parties that the he can delete or modify the communications. MiTM attack is a

    real threat to the security. For example, the MiTM which in the VoIP signaling

    or a media path can easily divert, wiretap, and even hijack selected VoIP calls

    [6]. Such MITM attacks on VoIP could cause a serious effect to the targeted

    VoIP users. For example, attackers are able to collect sensitive information such

    as bank account number, credit card number, PIN number and etc. of the victims.

    MITM is such a problem in Internet communication because there s no way to

    recognize someone's face and voice. Electronic communications are tools that the

    attackers are easy to discover because they would not be able to answer quickly

    when victims are suspicious about the caller, they might question the attackers

    about a shared history moment for example as a test [7]. That is why MiTM

    attacks work against web-based systems because the web is not synchronous [7].

    The attacker could simply pass and get through to the end of the

    communications.

    DoS attack is an attack that denies a service or connectivity on a network

    or devices, or bringing down the servers offering such services because it can

    overload the devices internal resources of the network. DoS attacks can be

    carried out by flooding a target with unnecessary SIP call-signaling messages.

    This can cause calls to drop prematurely and halts call processing [8]. The DoS

    attacks goal is to cause the service inoperable for as long as possible. By

    targeting victims computer and network of the site victims are trying to use, the

    attacker may be able to prevent victims from accessing websites, or other

    services. Floods are a common type of DoS attack. Floods happened when the

    attacker overloads the server with a request so it cannot process the victims

    request and they cannot access that site. The attacker can use spam email

    messages on victims email account. The email account services have assigned

    one account with a specific quota which one account has a limited amount of

    data at any given time [9]. The attacker can collect victim quota, preventing them

  • 9

    from receiving legitimate messages by sending large, or many email messages to

    the account [9].

    If SPAM is for email, SPIT is for VoIP which is an unwanted bulk calls

    or voicemails that sent over VoIP networks [10]. SPIT may be a bigger problem

    to deal with to compare with SPAM. SPIT might cause a bandwidth problem that

    will increase the bandwidth bills for several times. This is because voice

    messages carry up more bytes than emails which only a few kilobytes apiece.

    SPIT attacks are different with SPAM. SPAM can be detected before it interfere

    the recipient meanwhile in SPIT, there is too late for prevention of SPIT if the

    phone rings and the phone rings immediately after session initiation [10]. This

    will disturb the users current activity.

    2.1.3 Network Forensic

    VoIP is an application resides within the Internet environment. As the increasing

    number of people using the Internet, the number of illegal activities such as

    identity theft, data theft and etc. also increases drastically. Network forensics

    deals with the recording, capture or analysis of network events. With network

    forensics, it is able to analyze historical network traffic in order to conduct

    investigations for security attacks [11]. From the gathered information, it will

    help in identifying an unauthorized access to the system, and searches a solution

    to prevent them happening in future. This information can be used as for

    evidence in case of such an occurrence.

    The main goal of network forensics is to provide evidence that is

    sufficient to allow the criminal perpetrator to be successfully prosecuted [12].

    Network forensics require two steps, first gathering a complete network activity

    data and then interpreting the data. Network activity data build a necessary

    foundation for a network forensics investigation which interpreting forensic

  • 10

    network data could range from extracting files and reconstructing web sessions

    to tracing data leakage and detecting advanced persistent threats [13].

    2.1.4 WEKA Data Mining Tool

    Data mining is the process of analyzing data from different corners and

    summarizing it into useful information [14], and it is one of the analysis tools

    software for analyzing data. Data mining could be separate into two parts,

    directed and undirected. In directed data mining, it is trying to predict a particular

    data point, but in undirected data mining, it is trying to find patterns in existing

    data, or creates groups of data [15]. Data mining has dozens of techniques and

    procedures that used to examine and transform data. The data mining is to

    create a model that can improve the way to read and interpret the existing data

    and the future data [15].

    Waikato Environment for Knowledge Analysis (WEKA) is one of the

    data mining tools software and is open source software. WEKA is a collection

    of machine learning algorithms for data mining tasks and it is the product of the

    University of Waikato, New Zealand [15]. The software is written in the Java

    language. It contains tools for data preprocessing, regression, clustering,

    classification, association rules and visualization [16]. WEKA uses a flat text file

    describing the data and it can work with a variety of data files including its own

    file formats, Attribute Relation File Format (ARFF) and C4.5 file formats. ARFF

    is the WEKA default file type that use for data analysis, but the data also can be

    imported from a various formats [17]. The data can also be read from a

    Structured Query Language (SQL) database or from Uniform Resource Locator

    (URL).

  • 11

    2.2 Previous Work

    2.2.1 Skype Forensics in Android Devices

    In this research paper, Mohammed I. Al-Saleh and Yahya A. Forihat did some

    investigation on the evidences of Skype calls and chats in the Android devices.

    Smartphones, have a bit of capabilities similar to that of PCs which can store a

    large of data and different categories of information. Smartphone which is

    having an Android-based device is getting more popular because there are a lot

    of varieties of mobile Applications (Apps) that were developed to extend the

    functionality of the phones. VoIP Apps are extensively used that provided the

    usage for their wide availability and cheap prices and Skype is one of the popular

    VoIP Apps.

    Figure 2.2: Investigation Model [18]

  • 12

    This research paper might assume that Skype is one of the ways that

    helps in committing cybercrimes. Digital Forensics may be conducted on mobile

    devices, computers, and networks, in order to detect the cyber-criminal activities

    and prove them guilty under the law. Fig. 2.2 is an investigation models

    researchers designed. The figure summarized that the criminal starts a call

    conversation session with the victim. The conversation sessions from the

    criminals device need to be extracted by the investigator to extract evidences by

    inspecting both RAM and NAND flash memories [18].

    After doing several experiments, the pattern for each experiment had

    shown there were no differences between the call conversation patterns. The

    result of chat messages is found in both memories and have decreased the

    average number of occurrences for the different time durations. This means, chat

    messages were stuck for a long time in the flash memory without redundancy.

    The remaining number of messages still can be used as evidence. The researchers

    concluded that Skype conversation patterns and chat messages can be found in

    both of the RAM and NAND flash memories for a long time and regardless of

    deleting calls and chat histories and signing out of the Skype [18].

    2.2.2 Network Forensics Models for Converged Architectures

    A pattern is a solution to a problem that can be used to guide evaluation

    of systems or the design. The concept of forensic pattern is introduced by

    illustrating them using Unified Modeling Language (UML) object oriented

    models. Attack patterns are a description of the objectives and steps of an attack.

    From these attack patterns, it can obtain useful information to analyze a ways to

    stopping the attacks. Forensic pattern is a systematic approach to network

    forensic collection and data analysis. By using these forensic patterns,

    investigators or forensic teams will have a structured method to search, collect

    and analyze network forensic data.

  • 13

    Firewalls and Intrusion Detection System (IDS) is a general security

    mechanism that unable to detect and stop the attacks at a higher level. To stop it

    in the future, some details about the attackers activities need to collect and send

    them to be analyzed. Sensors with examination capabilities for collection of

    evidence are a way of collecting data which help reduce human intervention

    were used. These sensors are to capture all entering or leaving the system of

    voice packets. The evidence collector starts collecting forensic data if there is

    notifications alert of alarm that detect the against VoIP components. After

    collecting the forensic data, the evidence collectors will the data to the network

    forensics server. These data are used to discover and rebuild the attacking

    behaviors. The forensics server will perform the corresponding forensics

    analysis.

    Log correlation and normalization are one of the techniques to analyze

    forensic database and files. The evidence analyzer will presents results to the

    forensic investigator. This result will include such information as the IP address,

    the topology of the network, the MAC address, and possibly the geographic

    location of the IP. In Fig. 2.3, Juan C. Pelaez and Eduardo B. Fernandez

    described how a forensic system and IP telephony integrate. The model

    represented the three primary components, the forensic server, the evidence

    Figure 2.3 Class diagram for a VoIP network forensics system [19]

  • 14

    collector and the network investigator. The advantages using the forensic pattern

    are; automated evidence analyses will reduce response times of the forensic

    investigators, the analyzer can provide information about logs and for tracing

    back the attackers, and can determine the call history, when a user is using the

    VoIP device, and with whom the user communicates [19].

    2.2.3 Security Patterns for Voice over IP Networks

    The authors[REFERENCE REQUIRED], discuss the security attacks

    and related them to the ways the system is used and provided some defense

    mechanisms. Four security patterns are presented which provide good practices

    for VoIP in identifying and understanding the mechanisms needed. The patterns

    include VoIP Tunneling, Network Segmentation, Secure VoIP Call, and Signed

    Authenticated Call. Unified Modeling Language (UML) was used to make easier

    for the implementation of the patterns. There are three different types of

    connections when using the IP protocol. PC-to-PC, PC-to-Telephone, and

    Telephone-to-Telephone. VoIP uses the Real-Time Protocol (RTP) for transport,

    Real-Time Transport Protocol (RTCP) for reporting Quality of Service (QoS),

    and SIP, H.323 Media Gateway Control Protocol (MGCP) for signaling.

    In this journal[REFERENCE REQUIRED], there are several attacks that

    the authors presented. Theft of service, IP Spoofing, and Denial-of-service

    (DoS), masquerading, call interception, repudiation, call hijacking, and brute

    force is one of the presented attacks that against the VoIP network. The authors

    have made some detail analyzed of these attacks using the concept of attack

    pattern by considering the forensic aspects.

  • 15

    Fig.2.4 shows the relation between VoIP security patterns and related

    cryptographic patterns. The double box represented the patterns. In the Network

    Segmentation pattern, it will minimize disruption in the attack event and critical

    voice traffic wont impact. The VoIP Tunneling pattern uses encryption to ensure

    data integrity and confidentiality in VoIP networks. Tunnels will secure the VoIP

    traffic transport over the external network and eliminates the risk of exposing a

    network. The Signed Authenticated Call provides a suitable way for

    authentication of messages in VoIP and the best countermeasure for theft of

    service attacks. In Secure VoIP call, encryption and decryption of VoIP calls

    were used to provide good confidentiality.

    It concludes that, use VPNs and encrypt all voice traffic are the best

    security approach in VoIP. This would ensure that the critical voice traffic

    would be unaffected if an attack did occur on the data network [20]. To enhance

    the security in VoIP, filtering and firewalls can be implemented to control the

    traffic between the data VPN and the voice [20].

    Figure 2.4: Relationships between VoIP security patterns [20]

  • 16

    2.2.4 Enhancing Forensic Investigation In Large Capacity Storage Devices Using

    WEKA: A Data Mining Tool

    This research project focuses on large sets of data that can be handled by

    a data mining system. WEKA data mining tools are studied to demonstrate the

    data mining methodology and thus obtain the data. The WEKA tool kit is easily

    extendable and flexible. WEKA is written in Java and makes it easy to use and

    easily portable. It allows modeling techniques and data preprocessing.

    WEKA is a user friendly which provides a large set of functions and tools

    included attribute selection, pre-processing filters, data clustering, classification

    and selection of data, data visualization of data and association discovery.

    WEKA is open source free software that is available to all users and it can be

    used to run individual experiments. There are various data formats WEKA

    supported. These files are ARFF, Comma Separated value (CSV), Decision

    induction algorithm acceptable format etc.

    Fig. 2.5 present the flow of data mining that used in WEKA. Data is

    classified based on the attribute selection, and data are then divided into clusters

    based on the types of grouping that the user selects. The output obtained after

    clustering gives the accuracy of data when the data is clustered which can be

    Figure 2.5: Flow of Data Mining Methodology in WEKA [21]

  • 17

    used for future predictions. Finally regression analysis describes how regression

    can be applied and results can be visualized.

    This research project used a bank data to import into WEKA and

    implement it in 4 modules that represents data mining process stages. The source

    file can be in one of the formats which are either .arff or .csv. Fig. 2.6 is a

    WEKA preprocessing window with the bank data. The data are saved to bank-

    data-final.arff after the parameters are set up. The project was implemented in

    four modules which represents various stages and each task of data mining.

    Association, classification, clustering and regression are the four stages of data

    mining process [21].

    Figure 2.6: Preprocessing window [21]

  • 18

    2.3 Critical Analysis

    The following table is a review of the differences in the literature review.

    Table 2.1: Critical Analysis

    JOURNAL

    JOURNAL 1

    [REFERENCE

    REQUIRED],

    JOURNAL 2

    [REFERENCE

    REQUIRED],

    JOURNAL 3

    [REFERENCE

    REQUIRED],

    JOURNAL 4

    [REFERENCE

    REQUIRED],

    RESEARCH

    DATA

    Skype

    Converged

    Network

    Converged

    Network

    Bank

    Employee

    TOOLS

    SOFTWARE

    HARDWARE

    X

    X

    VOIP ATTACKS

    DoS

    X

    X

    SPIT

    X

    X

    X

    X

    MiTM

    X

    X

    X

    X

    PROTOCOL

    SIP

    X

    X

    X

    RTP

    X

    X

    X

  • 19

    CHAPTER III: RESEARCH METHODOLOGY

    This chapter will cover the detail explanation of methodology that is being used

    to make this project complete and working well. The method is used to achieve the

    objective of the project that will accomplish a perfect result. Subsequently, section 3.1

    introduces the methodology that be used in this project. In section 3.2 the resources of

    the hardware and software are listed. The budget and costing of the tools are listed in

    Section 3.3. Section 3.4 and Section 3.5 the Work Breakdown Structure (WBS) and the

    project timeline, Gantt chart was developed which consists of activity duration

    estimation and the development of the project schedule.

    3.1 Rapid Application Development (RAD) Methodology

    Rapid Application Development (RAD) methodology is selected to be

    used as a methodology model because it is a suitable process for software

    development and it used to replicate the flow of each work related to this project.

    This methodology is based on an iterations approach and prototype. Since this

    project involves with the existing data, comprises analysis and reporting of the

    data, RAD process works best in cases where the data is known, the

    requirements can be defined and kept unchanged during the development and the

    functional requirements can be met within a short time frame [22]. In this

    project the RAD methodology based on 6 phases which consist of Initiation

    phase, Planning phase, Design phase, Testing and Implementation phase,

    Verification phase and the last phase is Documentation phase.

  • 20

    RAD methodology is designed with advantages. Quality and speed are

    the primary advantages of this methodology. RAD increased the speed of

    development and decreased delivery time, which focuses on converting

    requirements to code as quickly as possible [23]. Increased quality is a RAD

    primary focus, which is defined as both the degree to which a delivered

    application meets the needs of users as well as the degree to which delivered

    systems has low maintenance costs and provide a considerable reduction in the

    errors due to the use of automation tools and prototyping. Errors and omissions

    are detected in the early stages of development, thereby preventing any extra

    effort or cost. [24].

    3.1.1 Initiation

    An initiation or feasibility study is conducted after getting an approval

    from the FYP supervisor. During the first of these phases, the initiation phase,

    the project objective, project scope and current problem statement are identified.

    A feasibility study is conducted to gather all the findings and data that related to

    the project. The findings include all the sources of the information from internet,

    books, journal, articles and previous study which is similar to this project or

    systems. From the research literature, it can spot various gaps in the literatures

    Figure 3.1: RAD model methodology

  • 21

    which can formulate a research question based on the research gaps and discuss

    how these projects are likely.

    3.1.2 Planning

    The next phase, the planning phase, all of the work to be done is identify

    where is the hardware and software resource requirements, and research model is

    identified, along with the strategy process to implement the project. A project

    plan is created outlining the activities, tasks, dependencies and timeframes and

    identified a project budget by providing cost estimates for the equipment and

    materials costs. The budget is used to monitor and control cost expenditures

    during project implementation. The project plan can be referred at Fig.3.3 and

    Fig.3.4 on pages 7 and 8.

    3.1.3 Design

    During the third phase, the design phase, the hardware and software are

    defined, and .pcap data files collections are collected in this phase. The system

    architecture, topology is well designed in this phase, which show the process of

    project work and the process of converting the .Pcap data files into a format that

    will be recognized by WEKA. Fig.3.2 shows the architecture of the project.

  • 22

    The .Pcap data files are the most available file format for logging network

    traffic and can be used by almost any network analysis tool which displays huge

    amounts of data that need to go through to find problems with the network. To be

    recognized by WEKA,. Pcap data files are converted into a temporary .csv data

    file format using a tshark Wireshark command line. Then the .csv data files will

    convert into .arff data files format that supported by WEKA using a simple txt

    notepad file and saved it as .arff file.

    3.1.4 Testing & Implementation

    In this phase, the project architecture is being tested in order to identify

    the effectiveness the test techniques that apply by converting the .pcap data files

    into a format that will be recognize by WEKA. The implementation will be

    started when the .pcap data file is successfully converted, and the hardware and

    software requirements are all gathered. All the installing and software setup is

    completed in this phase. Refer Section 4.2 in chapter four on page 32 . The

    collected data will be imported into WEKA that needs to be analyzed to get a

    result. The data collected from the company are subject to our worked with time.

    Figure 3.2: Architecture Topology

  • 23

    3.1.5 Verification

    The fifth phase is the verification. This is where the result in fourth phase

    will be verified in order to identify whether the data and the design implemented

    meets the requirements of the project or not. If there is failure in testing phase,

    there will be some modification to this system until it will run successfully. The

    conclusions can be made based on the correctness and completeness of

    development and operation in Testing phase process.

    3.1.6 Documentation

    The last phase is documentation where is the preparation of documented

    all the information and result that related to the project as a final report including

    the corrections and amendments the report before submission.

    3.2 Project Resources

    The project requires the following hardware and software. Table 3.1

    shows the hardware and Table 3.2 shows the software specifications. These are

    the minimum requirement needed to ensure the success of the simulator.

    3.2.1 Hardware Specifications

    No

    .

    Device Quantit

    y

    Specifications

    1 Laptop 1 ASUS brand

    Processor : Intel inside CORE i3

    RAM : 6.00 GB

    OS : Microsoft Windows 7

    Table 3.1: Hardware Requirement

  • 24

    3.2.2 Software Specifications

    3.3 Budget/Costing

    The following is review of the budget and costing of the hardware and software

    requirements. Table 3.3 shows the hardware and Table 3.4 shows the software

    estimated budget and costing.

    3.3.1 Hardware Estimated Budget

    Table 3.2: Software Requirement

    No

    .

    Software Descriptions

    1 WEKA

    Version: 3.7.10(Latest version)

    License/Price: Free

    OS: Windows 7,8,XP,Vista,2000

    Programming Language: Java

    Size: 25.9 Mb

    2 Wireshark

    Version: 1.10.1 (64-bit)

    License/Price: Free

    OS: Windows 7,Vista,XP

    Networking Software Tools

    Table 3.3: Project costing for hardware

    No. Equipment Quantity Price(RM) Remark

    1 Laptop 1

    1800

    Students properties

  • 25

    3.3.2 Software Estimated Budget

    3.4 Work Breakdown Structure (WBS)

    The following figure is WBS which is contains level of the work breakdown

    structure that provides further definition and detail.

    No

    .

    Equipment Quantity Price(RM) Remark

    1 WEKA 1

    -

    Open source

    2 Wireshark 1

    -

    Open source

    Table 3.4: Project costing for software

  • 26

    Figure 3.3: Work Breakdown Structure (WBS)

  • 27

    3.5 Project Timeline

    Project timeline in Fig.3.4 shows the time duration that is taken to accomplish

    this project. It shows every phase of the project development and schedule of the

    project to make sure the project will meet.

    Figure 3.4: Gantt chart

  • 28

    CHAPTER IV: TESTING AND IMPLEMENTATION

    This chapter explains the project testing and implementation stages. Section 4.1,

    testing stage will discuss on a conversion of the pcap files into arff format files. The

    testing stage is divided into two subsections. Section 4.1.1 introduces the conversion of

    the pcap files into csv files format, meanwhile in section 4.1.2 introduces the conversion

    of the csv files into arff files format. In section 4.2 will discusses on an ethical matters

    and in section 4.3 will discuss on a ways to analyze the data.

    4.1 Testing Stage

    This section defines the testing method on the project architecture

    topology. The project architecture is set as shown in Figure 3.2 on page 22. This

    stage is important to ensure that the test techniques that apply by converting the

    pcap data files into a format that will be recognized by WEKA is in a systematic

    manner.

    4.1.1 Pcap To Csv Conversion

    There is no direct conversion of pcap to arff formats. The csv file is an

    intermediate file between pcap and arff files. Wireshark will be used for

    converting pcap files into csv files.

  • 29

    Run the pcap files using wireshark and on File menu choose an Export

    Packet Dissection. This menu item allows exporting some of the packets in the

    capture file to file. In this case, choose CSV (Comma Separated Values packet

    Figure 4.1: Wireshark Export Packet Dissections

    Figure 4.2: Wireshark Export File

  • 30

    summary) as shown in figure 4.1 on pages 25. Then save the files as csv files

    format as shown in figure 4.2.

    4.1.2 CSV to Arff conversion

    This is a step to convert CSV to Arff using WEKA. First of all, itll need to

    install WEKA. It can be downloaded from http://www.cs.waikato.ac.nz/ml/weka.

    It is a free source. WEKA windows will look like Figure 4.3. An ArffViewer

    option under the Tools menu is to load or open the csv files into WEKA as

    shown as in figure 4.3 on pages 22.

    Figure 4.3: Weka GUI Chooser

  • 31

    Open the csv file by change files of types become CSV data files (*.csv) as

    shown in figure 4.4.

    Figure 4.4: ARFF-Viewer windows

    Figure 4.5: Weka Save Windows

  • 32

    Then save as the file in the file name delete ".csv" and change it to ".arff" like in

    figure 4.5, then the data files already finished converting csv file to arff file.

    4.2 Ethical Matters

    The ethical matter is pertaining to the data gathering that we collected

    from a third party company. We mentioned it here as to protect the companies

    and ourselves from legal action taken in the future if the data leaks. The first

    company that we approached is a security company through its employee that

    was one of our speakers during the UniKL Security Talk day. However, the

    company was unable to release the data due to the sensitivity of the data. The

    official letter sent to the company as in Appendix X

    The second attempt was through the Malaysian Computer Emergency Response

    Team (MyCERT), CyberSecurity Malaysia. After a few trials on phone calls and

    weeks, we got a response from one of the officer who is in charged on our

    request. We then sent a formal letter, refer to Appendix X, in order to conduct an

    interview with the officer. We also asked if the company could supply the data

    that are related to our project. Unfortunately the company did not keep data type

    that relates to network attacks. On the other hand, they provide advisories on

    what to do when an attack happens

    The third attempt was to set an interview with Vigilnet Company, which

    provided VoIP analysis. The person in charge was outstation for a few weeks,

    though the company agreed to supply the data. At the end the company supplied

    us with the VoIP data, however the data were clean data and with no trace of

    network or individual attacks on the data. Nevertheless, we still use this data as

    one of the analyses.

  • 33

    4.3 Analysis Stage

    Towards understanding and improving forensics analysis processes, in this stage

    an analyzing experiment were conducte on collected VoIP attack data for

    analysis. This stage enables to mark or discovers the source of security attacks or

    other problem incidents.

    4.3.1 Analyze Using WEKA

    This stage was focused on some common attack types of DoS attack which is

    ICMP Echo flood, UDP flood, TCP SYN flood, and a data from reliable sources

    by using WEKA Explorer preprocessing, classification, clustering, and attribute

    selection.

    4.3.1.1 ICMP Echo Flood

    4.3.1.1.1 Preprocessing

    The file was loaded into WEKA in the Preprocess window as shown in Fig.4.8

    by click on Open file button and choose the .arff file from the local file

    system.

    Figure 4.8: Weka Open File

  • 34

    Once the data is loaded, WEKA recognizes attributes that are shown in the

    Attribute window.

    Left panel of Preprocess window shows the list of recognized attributes:

    No.: number that identifies the order of the attribute as they are in the

    data file.

    Selection tick boxes: allow to select the attributes for working

    relation.

    Name: name of an attribute as it was declared in the data file.

    During the scan of the data, WEKA computes some basic statistics on each

    attribute. The following statistics are shown in Selected attribute box on the

    right panel of Preprocess window:

    Name: is the name of an attribute.

    Type: is most commonly Nominal or Numeric.

    Missing: is the number percentage of instances in the data for which

    this attribute is unspecified.

    Distinct: is the number of different values that the data contains for

    this attribute.

    Unique: is the number percentage of instances in the data having a

    value for this attribute that no other instances have.

    Figure 4.9: Weka Selected Attribute Box

  • 35

    No. is numeric. Therefore, the following frequency statistics for this attribute in

    the Selected attributes window:

    Missing: 0 means that the attribute is specified for all instances (no

    missing values).

    Distinct: 6 means that number. has six connections communication

    Unique: 6 means that other instances do have the same value as number.

    has.

    Time is a Numeric value. The statistics describing the distribution of values in

    the data - Minimum, Maximum, Mean and Standard Deviation. Minimum = 1 is

    the lowest time, Maximum = 2.075 is the longest time, mean and standard

    deviation. By comparing the result with the attribute table destunreachble.csv,

    the numbers in WEKA match the numbers in the table. Figure 4.11 showed the

    visualization of all attributes.

    Figure 4.10: Matched Attribute

  • 36

    4.3.1.1.2 Classification

    Classifiers in WEKA are the models for predicting nominal or numeric

    quantities.

    Figure 4.11: Attributes Visualization

    Figure 4.12: Classify Tab Windows

  • 37

    In the Fig.4.13, C4.5 algorithm and J48, decision tree learner is used to analyze

    the data sample. The C4.5 algorithm was chosen because of it can handle

    numeric attributes.

    Figure 4.13: Weka J48 Algorithm Tree

    Figure 4.14: Classifier Test Option

  • 38

    In this data sample, the classifier will be evaluated based on how well it predicts

    66% of the tested data. The Percentage split radio-button was checked and

    keeps it as default 66%. Percentage splits evaluate the classifier on how well it

    predicts a certain percentage of the data, which is held out for testing. The

    amount of data held out depends on the value entered in the % field. When the

    options have been specified, the learning process will be started by click on the

    Start button.

    4.3.1.1.3 Clustering

    Clustering in WEKA is for finding groups of similar instances in a dataset.

    Figure 4.15: Cluster Tab Windows

    Figure 4.16: Weka Gui Generic Object Editor Window

  • 39

    Once the cluster scheme SimpleKMeans is selected, a

    weka.gui.GenericObjectEditor screens came up by right-click on the algorithm

    as shown in Fig.4.16. The value in numClusters box was set to 7 because it has

    seven clusters in the .arff file.

    When training set is completed, the Cluster output area on the right panel of

    Cluster window is filled with text describing the results of training and testing.

    A new entry appears in the Result list box on the left of the result.

    4.3.1.1.4 Attribute selection

    Attribute selection searches through all possible combinations of attributes in the

    data and finds which subset of attributes works best for prediction.

    Figure 4.17: Cluster Output

  • 40

    In Fig.4.18, the CfsSubsetEval and BestFirst search method was set up to

    search through all possible combinations of attributes in the data and find which

    subset of attributes works best for prediction. The results of selection are shown

    on the right part of the window when the attribute selection process is finished as

    shown in Fig.4.19.

    Figure 4.18: Select Attribute Tab Windows

    Figure 4.19: Attribute Selection Output

  • 41

    The implementation of the other data which are UDP Flood, TCP SYN

    Flood and the data from reliable source were not shown because it have same

    steps as shown by ICMP Flood data, so the results on each data will analyze on

    next chapter, Chapter V: Result and Analysis.

  • 42

    CHAPTER V: RESULT AND ANALYSIS

    This chapter discusses the results of the experiments conducted as described in

    Chapter 4. There are four discussed results regarding to attacks. The results were

    separated into each section according to the attacks. Section 5.1 discusses on data that

    got from reliable sources and Section 5.2 discusses on ICMP Flood attack data. In

    section 5.3, TCP SYN Flood attack data will be discussed.

    5.1 Reliable Data

    The protocols involved in the pcap can be viewed in the protocol

    classifier tree. SIP, RTCP, RTP, and HTTP were the protocol which involved as

    shown in run information below:

    = = = Run information = = =

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

    Relation: reliabledata

    Instances: 4447

    Attributes: 7

    No.

    Time

    Source

    Destination

    Protocol

    Length

    Info

    Test mode: split 66.0% train, remainder test

  • 43

    = = = Classifier model (full training set) = = =

    J48 pruned tree

    ------------------

    Time 201.325642

    Length 202: RTP (3003.0/15.0)

    Number of Leaves: 10

    Size of the tree: 19

    Time taken to build model: 0.06 seconds

  • 44

    SIP is a signaling protocol used for controlling multimedia

    communication sessions, like voice or video calls over IP. The protocol can be

    used for modifying, creating and terminating two-party or multiparty sessions

    consisting of one or several media streams. In this capture file, SIP is used to

    create and tear down VoIP sessions.

    RTP defines a standardized packet format for delivering audio and video

    over the Internet. RTP is usually used in conjunction with the RTCP. When in

    conjunction, RTP is usually originated and received on even port numbers,

    whereas RTCP uses the next higher odd port number. In this capture file, RTP is

    used as the media protocol to transport voice.

    RTCP partners with RTP in the delivery and packaging of multimedia

    data, but does not transport any media streams itself. RTCP itself does not

    provide any flow encryption or authentication methods.

    HTTP is a request-response protocol standard for client-server

    computing. In this capture file, HTTP is used to communicate with the GUI

    frontend of the SIP PBX.

    = = = Run information ===

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

    Relation: reliabledata

    Instances: 4447

    Attributes: 7

    No.

    Time

    Source

    Destination

    Protocol

  • 45

    Length

    Info

    Test mode: split 66.0% train, remainder test

    = = = Classifier model (full training set) = = =

    J48 pruned tree

    ------------------

    Protocol = SIP

    | Source = 172.25.105.43: Request: OPTIONS sip:[email protected] | (1.0)

    | Source = 172.25.105.40

    | | Length 574

    | | | No. 1298: Status: 401 Unauthorized | (2.0)

    | Source = 172.25.105.3

    | | No. 1302: Request: ACK sip:[email protected] | (3.0/1.0)

    | | No. > 1: User-Agent: Asterisk PBX 1.6.0.10 | FONCORE

    At the beginning of the attack, the attacker 172.25.105.43 sent a SIP

    OPTIONS request for extension 100 at 172.25.105.40. Luckily, 172.25.105.40

    responded to the request with a 200 OK response. The information that is useful

    for the attacker is the User-Agent message header field of the response. Given

    this information, the attacker now knows that he/she is facing an Asterisk PBX

    and FONCORE Tribox family distribution. With these clues in hands, the

    attacker tried to connect to the box with HTTP.

  • 46

    5.2 ICMP Echo Flood

    Internet Control Message Protocol (ICMP), which enables users to send

    an echo packet to a remote host to check whether its alive. These packets

    request reply from the victim and this results in saturation of the bandwidth of

    the victims network connection.

    === Run information ===

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

    Relation: icmp

    Instances: 6

    Attributes: 7

    No.

    Time

    Source

    Destination

    Protocol

    Length

    Info

    Test mode: split 66.0% train, remainder test

    === Classifier model (full training set) ===

    J48 pruned tree

    ------------------

    Source = 10.2.10.2: Echo (ping) request id=0x0200, seq=9472/37, ttl=32

    [ETHERNET FRAME CHECK SEQUENCE INCORRECT] (3.0/2.0)

    Source = 10.2.99.99: Destination unreachable (Host unreachable) [ETHERNET

    FRAME CHECK SEQUENCE INCORRECT] (3.0)

  • 47

    Number of Leaves : 2

    Size of the tree : 3

    Time taken to build model: 0.01 seconds

    === Evaluation on test split ===

    Time taken to test model on training split: 0 seconds

    === Summary ===

    Correctly Classified Instances 0 0 %

    Incorrectly Classified Instances 2 100 %

    Kappa statistic 0

    Mean absolute error 0.5

    Root mean squared error 0.6124

    Relative absolute error 133.3333 %

    Root relative squared error 141.4214 %

    Coverage of cases (0.95 level) 0 %

    Mean rel. region size (0.95 level) 50 %

    Total Number of Instances 2

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-Measure MCC ROC Area

    PRC Area Class

    0.000 0.000 0.000 0.000 0.000 0.000 ? ? Echo

    (ping) request id=0x0200, seq=9472/37, ttl=32 [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT]

  • 48

    0.000 0.000 0.000 0.000 0.000 0.000 ? 1.000

    Destination unreachable (Host unreachable) [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT]

    0.000 1.000 0.000 0.000 0.000 0.000 ? ? Echo

    (ping) request id=0x0200, seq=9728/38, ttl=32 [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT]

    0.000 0.000 0.000 0.000 0.000 0.000 ? ? Echo

    (ping) request id=0x0200, seq=9984/39, ttl=32 [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT]

    Weighted Avg. 0.000 0.000 0.000 0.000 0.000 0.000 0.000

    1.000

    === Confusion Matrix ===

    a b c d

  • 49

    5.3 TCP SYN Flood

    The SYN flooding attacks exploit the TCPs three-way handshake

    mechanism and its limitation in maintaining half-open connections. When a

    server receives a SYN request, it returns a SYN/ACK packet to the client. Until

    the SYN/ACK packet is acknowledged by the client, the connection remains in

    half-open state for a period of up to the TCP connection timeout.

    === Run information ===

    Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

    Relation: tcp

    Instances: 9

    Attributes: 7

    No.

    Time

    Source

    Destination

    Protocol

    Length

    Info

    Test mode: split 66.0% train, remainder test

    === Classifier model (full training set) ===

    J48 pruned tree

    ------------------

    Source = 192.168.0.1: boinc-client > neod2 [ACK] Seq=1 Ack=1 Win=8760

    Len=0 [ETHERNET FRAME CHECK SEQUENCE INCORRECT] (3.0/2.0)

  • 50

    Source = 192.168.0.2: [TCP Retransmission] neod2 > boinc-client [PSH, ACK]

    Seq=5841 Ack=1 Win=8760 Len=648 [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT] (6.0/1.0)

    Number of Leaves: 2

    Size of the tree: 3

    Time taken to build model: 0 seconds

    === Evaluation on test split ===

    Time taken to test model on training split: 0 seconds

    === Summary ===

    Correctly Classified Instances 1 33.3333 %

    Incorrectly Classified Instances 2 66.6667 %

    Kappa statistic 0.1429

    Mean absolute error 0.2667

    Root mean squared error 0.483

    Relative absolute error 84.6154 %

    Root relative squared error 116.1347 %

    Coverage of cases (0.95 level) 33.3333 %

    Mean rel. region size (0.95 level) 26.6667 %

    Total Number of Instances 3

    === Detailed Accuracy By Class ===

    TP Rate FP Rate Precision Recall F-Measure MCC ROC Area

    PRC Area Class

  • 51

    0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.333

    boinc-client > neod2 [ACK] Seq=1 Ack=1 Win=8760 Len=0 [ETHERNET

    FRAME CHECK SEQUENCE INCORRECT]

    0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.333

    neod2 > boinc-client [PSH, ACK] Seq=5841 Ack=1 Win=8760 Len=648

    [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

    0.000 0.333 0.000 0.000 0.000 0.000 ? ?

    boinc-client > neod2 [ACK] Seq=1 Ack=2921 Win=8760 Len=0 [ETHERNET

    FRAME CHECK SEQUENCE INCORRECT]

    0.000 0.000 0.000 0.000 0.000 0.000 ? ?

    boinc-client > neod2 [ACK] Seq=1 Ack=5841 Win=8760 Len=0 [ETHERNET

    FRAME CHECK SEQUENCE INCORRECT]

    1.000 0.500 0.500 1.000 0.667 0.500 0.750 0.500

    [TCP Retransmission] neod2 > boinc-client [PSH, ACK] Seq=5841 Ack=1

    Win=8760 Len=648 [ETHERNET FRAME CHECK SEQUENCE

    INCORRECT]

    Weighted Avg. 0.333 0.167 0.167 0.333 0.222 0.167 0.583

    0.389

    === Confusion Matrix ===

    a b c d e neod2 [ACK] Seq=1 Ack=1 Win=8760 Len=0

    [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

    0 0 0 0 1 | b = neod2 > boinc-client [PSH, ACK] Seq=5841 Ack=1 Win=8760

    Len=648 [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

    0 0 0 0 0 | c = boinc-client > neod2 [ACK] Seq=1 Ack=2921 Win=8760 Len=0

    [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

    0 0 0 0 0 | d = boinc-client > neod2 [ACK] Seq=1 Ack=5841 Win=8760 Len=0

    [ETHERNET FRAME CHECK SEQUENCE INCORRECT]

  • 52

    0 0 0 0 1 | e = [TCP Retransmission] neod2 > boinc-client [PSH, ACK]

    Seq=5841 Ack=1 Win=8760 Len=648 [ETHERNET FRAME CHECK

    SEQUENCE INCORRECT]

    From the information above, the file begins with standard TCP ACK

    packets sent between 192.168.0.1 and 192.168.0.2. When TCP sends a packet to

    a destination and does not get a reply, it waits a specified amount of time then

    retransmits the original packet. If a response is still not received, the source

    (transmitting) computer doubles the amount of time it waits for a response before

    sending another retransmission. Once the retransmission attempts have failed, the

    connection has completely failed and the data in the transmission is lost.

  • 53

    CHAPTER VI: CONCLUSION

    This chapter contains a conclusion and some recommendation and suggestion

    that are made for future improvement and enhancing the project that conclude after

    testing and result. The essence of the study is to analyze VoIP traffic trace using WEKA

    a data mining tool. We believe that the objective is achieved.

    6.1 Project Accomplishment

    In the early days of VoIP, there was no big concern about security issues

    related to its use. People were mostly concerned with its cost, functionality and

    reliability. Now that VoIP is gaining wide acceptance and becoming one of the

    mainstream communication technologies, security has become a major issue. To

    overcome a major problem, the network forensic is prepared to the monitoring

    and analysis of computer network traffic for the purposes of information

    gathering, legal evidence, or intrusion detection.

    This project started with converting the pcap (Packet Capture) into

    Attribute-Relation File Format (arff) which format that WEKA recognize and

    learned how to analyze the data by using WEKA Explorer preprocessing,

    classification, clustering, and attribute selection before getting the data from

    company who provide VoIP analysis.

    We believed that the objectives set for this project are met. The first

    objective is to analyze the pattern of attack data from the captured data. In which

    case, the data indicates the condition of the network events.

  • 54

    The second objective is also achieved. It is to convert the pcap data to arff

    data file so that the input will be recognized by the WEKA data mining tool. It is

    important to state that and the first objective depends on this second objective.

    We have some hiccup in getting the right data for our analysis since many

    companies are tied with the legality that refrain them from sharing their data with

    us. However, we still get data from a simulated data from other related project

    conducted by another student in UniKL. Otherwise, our research will produce

    more interesting findings.

    6.2 Future Recommendation

    For the future recommendation, there are few aspects that can be

    further enhanced by expanding a few features and criteria to make the

    analysis more firm and strong.

    Suggestion for Improvement Current Project Situation

    Improve Data Set or create a traffic

    simulation program to collect the

    required data

    As thedata in this project in not

    related to VoIP attack due to a

    certain problems, the pure collected

    data that related to VoIP attack can

    be analyzed for the future

    enhancements.

    Include more type of attacks that are

    related to VoIP. Different type of VoIP

    attacks such as Vishing (VoIP Phishing),

    Eavesdropping, and Identity and service

    theft can also be used in order to find the

    different result.

    Only looking for SPIT and MiTM

    attacks

    Expand the analytical knowledge by

    using WEKAs Simple CLI interface.

    Analysis using the GUI interface is

    user friendly, but would not speed

  • 55

    Scripts can be written to allow the data

    processing to be executed automatically.

    up the process.

    As a conclusion we would like to highlight that the issues with VoIP security are one

    of the concerned raised by the VoIP community. Although the problem is still under

    control the system admin currently is not equipped with the right tools to detect the

    VoIP attacks as earliest as possible. In most cases Wireshark or other network sniffer

    is used to determine the condition of the network. We are trying to provide

    alternative tools to the system admin by providing report pattern produced by a data

    mining tool like WEKA.

  • 56

    REFERENCES

    [1] A Brief History of VoIP Document One - The Past. Hallock, Joe. 2004.

    [2] AmnaSaad. Secure VoIP Performance Measurement. 2013.

    [3] How Does VoIP Work? discusstech.org. [Online] [Cited: November 17,

    2013.] http://discusstech.org/2011/05/how-does-voip-work/.

    [4] Voice over IP: Forensic Computing Implications. MatthewSimon. 2006.

    [5] The Difference Between VoIP and PSTN Systems. webopedia.com. [Online]

    [Cited:November 17, 2013.]

    http://www.webopedia.com/DidYouKnow/Internet/2008/VoIP_POTS_Differ

    ence_Between.asp.

    [6] On the Feasibility of Launching the Man-In-The-Middle Attacks on VoIP

    from Remote Attackers. Ruishan Zhangy, Xinyuan Wangy, Ryan Farleyy,

    Xiaohui Yangy, Xuxian Jiang. 2009.

    [7] Man-in-the-Middle Attacks. schneier.com. [Online] July 15, 2008.

    [Cited:November 18, 2013.]

    http://www.schneier.com/blog/archives/2008/07/maninthemiddle 1.html.

    [8] Security Threats In VoIP. voip.about.com. [Online] [Cited: November 18,

    2013.] http://voip.about.com/od/security/a/SecuThreats.htm.

    [9] Understanding Denial-of-Service Attacks. us-cert.gov. [Online] [Cited:

    November 18, 2013.] http://www.us-cert.gov/ncas/tips/ST04-015.

    [10] SPIT: Spam Over Internet Telephony. asteriskblog.com. [Online] [Cited:

    November 19, 2013.] http://www.asteriskblog.com/spit-spam-over-internet-

    telephony.

  • 57

    [11] Network Forensics 101: Finding the Needle in the Haystack. WildPackets

    white paper.

    [12] Network Forensic. cyberforensics.in. [Online] [Cited: November 20, 2013.]

    http://www.cyberforensics.in/(A(cos8NMWQywEkAAAAODMwODM4Y

    WMtNWFmZC00ZWNhLThkNDEtNTlhMWM3MGE5MzA5hkCziwldj9ts

    _CCtkjYQI68akds1))/Research/NetworkForensics.aspx?AspxAutoDetectCoo

    kieSupport=1.

    [13] Network Forensics & Packet Capture Analysis. ipcopper.com. [Online]

    [Cited: November 20, 2013.] http://www.ipcopper.com/data_analysis.htm.

    [14] Data Mining: What is Data Mining? anderson.ucla.edu. [Online] [Cited:

    November 20, 2013.]

    http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palac

    e/datamining.htm.

    [15] Data mining with WEKA,Part 1: Introduction and regression. [Online]

    [Cited: November 20, 2013.]

    http://www.ibm.com/developerworks/library/os-weka1/.

    [16] Weka - Modified for Data Mining Course at WPI. [Online] [Cited:

    November 21, 2013.] http://davis.wpi.edu/~xmdv/weka/.

    [17] Introduction to Weka - A Toolkit for Machine Learning.

    [18] Skype Forensic in Android Devices. Forihat, Mohammed I. Al-Saleh &

    Yahya A. 2013.

    [19] Network Forensics Models for Converged Architectures. Fernandez, Juan

    C. Pelaez & Eduardo B. 2010.

    [20] Security patterns for Voice over IP Networks. Eduardo B. Fernandez, Juan

    C. Pelaez and Maria M. Larrondo-Petrie. 2007.

  • 58

    [21] Enhancing Forensic Investigation in Large Capacity Storage Devices using

    WEKA: A Data Mining Tool. Lanka, Shravya. 2011.

    [22] The Rapid Application. Issam J Zeinoun Cambridge Technology

    Enterprises, Inc. 2005.

    [23] Rapid Application Development. Core Partners Inc. s.l.:

    www.corepartners.com.

    [24] Advantages of Rapid Application Development. buzzle.com. [Online] 200-

    2013. [Cited: December 12, 2013.]

    http://www.buzzle.com/articles/advantages-of-rapid-application-

    development.html

  • 25