nate french internship final paper

Nate French

Final Project

Internship

Prof. Van Slyke

Analyzing Digital Forensics Tools in a Virtualized Environment

1

Analyzing Digital Forensics Tools in a Virtualized Environment

The purpose of this paper is to research the viability of using digital forensic software in a virtual

environment. The first topic to be addressed is whether or not specific tools will perform

competently within a virtual machine. This form of virtualization, platform virtualization, has

been available since the 1990s. Virtualization has progressed immensely since it was first

released; however, it is still a work in progress (Sperling, 2010). The second area of focus will

be on providing a benchmark on the speed a virtualized environment will provide compared to a

non-virtualized environment. As the price per unit of storage continues to fall, the speed that

digital forensics frameworks can perform will continue to be a significant factor. A third area of

focus will be to assess the accuracy of this software when run from a virtual machine compared

to a non-virtualized environment.

The acceptance of digital forensics depends solely on the accuracy of the tools and

environment used. With the increasing acceptance of virtualization and cloud computing for both

cost and performance gains, the advent of distributed forensic frameworks is inevitable

(Roussev, 2004). Therefore, determining the accuracy of these environments is crucial for

continued acceptance of digital forensics. Through testing, this research will provide statistical

data to be used for determining the best approach to performing digital forensics. This testing

will be by no means conclusive or definitive on this topic as there is a plethora of variables that

could be tweaked to determine an optimal environment for performing virtualized digital

forensics. However, this research will provide a base for continued research in this area.

This research paper will first explain the key terms and ideas that will be expressed

throughout this paper. Then this paper will firmly establish the importance of performing the

research, after that the framework for performing the research, the hardware, and the software

2

used will be discussed. Then it will be possible to discuss and evaluate the results gathered from

the experimentation.

A. Introduce Topic and Define Key Terms

1. Virtualization The main aspect of understanding this research is the topic of

virtualization, what it is and what it does. Virtualization refers to the method of creating a virtual

object that acts and behaves like the real object. With platform virtualization, it is possible to

create a second instance of an operating system running on the same hardware as the initial

instance. For example, it is possible to run an instance of a Linux operating system on a

computer that natively runs Microsoft Windows. In this instance, the Linux operating system is

referred to as the guest, while the Windows instance is called the host. The guest instance is a

logically segregated system that is sharing the resources of the host system. The two instances

can share the processing time of the central processing unit (CPU), or if more than one CPU is

present, they can be allocated to each instance. A logical partition of the disk space available on

the hard drive will be segregated for use by the guest system. The guest will also receive a

dedicated portion of the random access memory (RAM), which is used for quick access to

important data that are used frequently.

2. Accuracy The second aspect of this research project is testing the accuracy and

speed of the forensic tools. The accuracy portion of this research refers to file hashing and the

number of files discovered. File hashing is the process of creating a unique identifying string for

a file, similar to the idea of a finger print. Even a miniscule change in a file results in a wildly

different file hash. A file hash is used in the court of law to prove that one file is the exact

replica of another file, or that a forensic image of a hard drive is a replica of the original. The

second portion of this area is determining that forensic tools run in a virtual environment will

3

discover the same number of files on a computer compared to performing the examination in a

non-virtualized environment. This is important to ensure that the examiner will discover all files

of interest to the case. If the virtualized environment cannot discover all files, then the validity

of performing virtualized investigations is negated. This is related to the term of rate of error.

Does the virtualized environment experience the same rate of error as a non-virtualized

environment (hopefully 0%)?

3. Speed The speed of the different environments is not a critical factor from a legal

perspective, as the law is only concerned with the accuracy of the results. However, speed is

important when determining what type of environment to use. If the speed of the case creation

process is severely hampered by the use of a virtualized environment, then that fact may negate

any of the benefits of using the virtualized environment. When evaluating the speed of this

process, we are concerned with the processing and indexing features of the Forensic Tool Kit

(FTK). In order to create a case, FTK must process and index an image file of a drive.

Processing is the act of determining where a file starts, stops, the file type, and the contents of

the file. That file is then indexed into a case database file. This process is essentially

enumerating the data in the evidence so that they are searchable by FTK. This form of indexing

allows the investigator to quickly search the image for evidence with different methods. The

first method that will be tested is keyword searching. In this method the investigator can search

the entire drive for a specific keyword that may be evidentiary value. The second search method,

which can be a very powerful method for returning specific evidence, is regex searching. In this

method, the investigator can search the image for a specific pattern of data. An example of this

would be searching the image for credit card numbers. To perform this search manually would

be very time intensive especially if the suspect attempted to hide the data in any manner. Since

4

credit card numbers typically follow a specific pattern, a regex search would allow the

investigator to find all credit cards of a certain pattern with one search. During the testing phase

of this research, the speed that keyword searching and regex searching will be documented for

each environment. The last forensic ability to be tested is the ability for FTK to perform file

carving. File carving is the act of carving deleted files out of unallocated space. File carving is

an important part of digital forensics as the most critical evidence is typically deleted by a

suspect. Proving that a virtualized environment can carve files as competently as a non-

virtualized environment is paramount for supporting a virtualized forensic framework.

One of the main contributing factors to speed, as reported in (FTK whitepaper) is the

input/output (I/O) channels on the computer. These channels consist of cables and busses

between the various hardware components of the computer. There are optimal configurations

explained in this paper that will be used in the setup of the experiment and discussed more

thoroughly later in the paper.

As discussed previously, virtualization and distributed frameworks are becoming more

commonplace in professional environments. Both have their pros and cons compared to their

opposites. One of the main pros for virtualized environments is the use of snapshots. This is the

ability of creating a restore point of a virtual machine at any given time. Generally this is done

once the optimal configuration of the virtual machine is complete. Over time, computers

generally get bloated with unnecessary data and slow down. With the use of a snapshot, the

machine can almost instantaneously be reverted back to its pristine condition. This is important

for maintaining an efficient setup. The main con of a virtual environment is that the virtual

machine will always have fewer resources than the machine hosting it. This means a virtual

machine generally takes longer to perform tasks than the host machine would as it has fewer

5

resources available to perform the operation. With cloud computing this is generally a non-issue

because the customer can purchase the same amount of resources as their host computers

(Roussev, 2004). However, cloud computing means those resources are located in a remote

setting that is not under the direct control of the user. The beneficial aspect of a distributed

framework is that one job can be parsed out to multiple machines, generally improving the speed

at that the operation is completed (Golden, 2004). However a distributed framework requires a

competent network between devices, a protocol to share data, and a method of organizing the

work distribution. The negative attribute of a distributed framework is that the user is

introducing more variables in the system. With more variables comes an increased chance that

something can go wrong.

B. Establish the importance of testing forensic tools. Testing forensics tools is

important for numerous reasons. Most importantly, testing forensic tools before using them in a

production environment is a necessary and critical step in gaining legal acceptance. In order for

a tool to produce legally accepted evidence pertinent to a criminal case, there has to exist

research that shows conclusively what the tool is and is not capable of. The importance of digital

forensics software will continue to grow as the importance of digital technology grows in

society. As technology, the Internet, and E-commerce continue to grow, so will the prevalence

of cybercrime. A second reason for the increase of cybercrime is the rise of “bring your own

device” (BYOD) in professional settings. In general, corporate networks are more secure than

home-based networks and devices. With the rise of BYOD, employees are frequently taking

devices in and out of corporate networks. This increases the vectors of attack as these devices

leave protected networks where the chance of infection is much greater. These infections can

then migrate onto corporate networks where the rewards of cybercrime will generally be greater.

6

Digital forensic tools are critical in performing cybercrime investigations. E-commerce is

estimated to grow at a rate of 12%-15% every year. In the United States alone, E-commerce

sales for the year of 2012 reached $225 billion. By the year 2017, E-commerce is expected to hit

$435 billion in yearly revenue (Trends and Data, 2012). In contrast, it is much more difficult to

come up with accurate statistics on the global cost of cybercrime (Mcafee, 2013). Current

research places an upper limit on cybercrime around .5%-1% of national income. Extrapolation

of this data leads to a lower range of $25 billion to an upper range of $140 billion for the United

States. Regardless of the actual numbers, investigators need competent tools to investigate

cybercrime. This issue becomes even more important when viewed from a national security

perspective.

Cyber-warfare and Cyber-Espionage is the new domain for intra-country competition.

For example, in 2012, the U.S. Navy was experiencing on average 110,000 cyber-attacks per

hour (Worth, 2012). Cyber-attacks, when viewed from a national security perspective, can

compromise intelligence, the security of troops around the globe, national secrets, and classified

technology. When investigating such incidents, it is critical to know exactly what happened,

what may have been taken, and what sensitive data may have been compromised. It is no longer

the act of recovering fraudulent funds, but protecting the lives and security of our country.

Forensic tools are the backbone of the military’s process of “cyber-attack recovery, reaction, and

response functions” (Giordano, 2002).

The second main reason for performing this experiment is the growth of magnetic storage

devices. This research will attempt to benchmark the speed of forensic software in a virtual

environment. This is important due to the fact that the growth of magnetic storage is exponential

and is expected to continue growing at this rate for a long time. The price per megabyte is

7

expected to shrink at a rate of 48% each year (Webb, 2003). As the size of storage media

continues to grow, speed will continue to become a critical factor in performing digital forensics.

Consider the fact that when commercial magnetic storage was introduced in 1956, the cost per

megabyte was $10,000. The 2013 cost per megabyte for magnetic storage is now $.00006

(citation needed). New methods for data storage are continually being researched with many

producing spectacular results. One outstanding sample is the research performed by IBM that

has found the current atomic limit to magnetic data storage. The experimental storage device is

“at least 100 times denser than today’s hard disk drives” (Loth, 2010).

In contrast to the growth of magnetic storage, the growth of processors, the device

that processes all of this data, is expected to die within the decade. This is known as the death of

Moore’s Law, which perfectly described the growth of processing speed from the 1960’s to

2020. On average, the processing speed of digital devices has doubled every 18 months.

However, this process of increasing the density of transistors will reach its physical limit in 2020

when transistors reach the 7nm or 5nm mark. At this point, the explosive growth of processing

speed will end, while the growth of magnetic storage continues to grow. DARPA tracks projects

to replace the complementary metal-oxide-semiconductor (CMOS) technology, however only

three of these replacements are potential candidates and even then, they are not very promising

(Meritt, 2013). Investigators will then be faced with the knowledge that evidence drives will

likely continue to grow exponentially while processing power will grow linearly. Due to this

knowledge, the speed that evidence can be processed will be a critical factor in performing

digital forensics. It is critical that investigators use the most optimal environments when time is

a factor. The diminishing growth of processing speed will in turn create the necessity for cloud

computing and distributed frameworks. Cloud computing is the concept of using multiple

8

computers, geographically separated but able to communicate through networks, that work in

tandem. Typical commercial cloud computing options, such as EC2 from Amazon, allow the

customer to use a set amount of processing power and space. The cloud computer instances are

generally virtual machines located on a much larger commercial server, which is one reason this

research will investigate the use of forensic tools in a virtual environment. A distributed

framework is a method of parsing the work of one job between multiple computers in an

organized fashion. FTK already has a Distributed Forensic Framework (DFF) that allows three

remote computers to assist a fourth workstation. This DFF will likely be expanded upon in the

future to allow the support of more machines. One further avenue of investigation for this line of

research is combining this DFF with the use of virtual machines. In order to determine these

optimal environments, further testing must be performed.

Testing provides numerous benefits to enhance the state of digital forensics.

Through testing, researchers can verify that a process works in a non-production environment. It

is not feasible to test software in a production environment, in which the results of criminal

investigations rely on the results found. Testing needs to be performed before production

environments are utilized. This testing advances the requirement of having evidence being

accepted in a court of law. Research is needed to prove whether or not a tool behaves in a

particular fashion and whether the data produced are forensically sound. Testing will also help

investigators make intelligent decisions when deciding what tools to use for a particular case.

Some tools may perform better than others when investigation certain aspects of a cybercrime.

In some cases, such as incident response investigations for national security purposes, time may

be the important factor to consider. Investigators will need to analyze computer logs to

determine what did or did not happen. Other cases may be more focused on retrieving a large

9

number of financial records and documents. While other cases may only be concerned with what

pictures that reside on a computer. By testing different software in different scenarios, it will be

possible to determine the best tools for each depending on what type of evidence the investigator

is interested in. Finally, proper testing should create a reproducible experiment that can

repeatedly be verified by other researchers. This factor helps to gain court acceptance by

providing scientifically sound and verifiable results.

II. Background Context

A. History of digital forensics. Digital forensics is a sub-category of the forensic

sciences that deals with the examination of evidence on digital media. Digital mediums include,

but are not limited to, computers, removable storage, network traffic, and mobile phone. Digital

forensics became a national concern in the 1980s as digital devices began to make their way into

the corporate world. In 1984, the FBI created the CART, the Computer Analysis and Response

Team, which was the first federal group organized to deal with cybercrime and digital forensics

(FBI, n.d.). From this humble beginning, the digital forensics industry has grown to a billion

dollar a year industry with annual growth rates of 11% (“Digital Forensic Services in the U.S.

Market Research”, N.D.).

Digital forensic cases can encompass many different types of crimes such as

hacking or possession of contraband. However, each case will have a similar work flow. Cases

begin when a crime has been detected in that evidence may be on a digital device. After

detection, the evidence is seized or acquired, following chain-of-custody protocols. The chain of

custody protocol requires that the transfer of evidence is documented whenever possession of the

evidence changes. By following this protocol, investigators can prove that non-authorized

personal did not have access to the evidence or the chance to modify it. Once the evidence

10

mediums are brought to the forensics labs, the lab must take an inventory of all evidence in their

possession. This would include the number of devices, the make and model of each device, and

which case they belong to. Once the inventory is complete, the lab can begin to make images of

each device. This is accomplished by attaching the digital device to a write blocker that prevents

any changes from being made to the evidence. The imaging hardware or software will then

make a byte-by-byte copy of the device to produce an exact replica. This is important as

investigators cannot perform the investigation on the original evidence as the process would then

introduce changes to the evidence and call its validity into question. Once an image of the

evidence has been created, the original evidence would then be returned to secured storage. The

investigators can then use their tools of choice to perform the investigation. This would include

searching the device for any relevant evidence, documenting the steps and procedures used to

acquire the evidence, and organizing the evidence into a package. This is required so that, if

necessary, a third party can verify the results of the investigation.

The work flow and examination procedures for digital investigations were

designed around pertinent case law and requirements for acceptance of evidence in a court of

law. In order to establish guilt, the prosecution must provide evidence that proves criminality

beyond a reasonable doubt (Commonwealth v. Webster, 1850). In order for evidence to be

accepted by the court, the evidence must pass admissibility tests. To be admissible, evidence

must be relevant to the case, must be a material possession (not hearsay), and must not be

precluded by an exclusionary rule. The prosecution must be able to prove the authenticity of the

evidence, meaning the evidence if what it is represented to be. This relates to the chain of

custody mentioned earlier, as the location and possession of the evidence can be accounted for

and testified for through its life cycle. This ensures that the evidence has not been tampered

11

with. Evidence must also pass the Frye test that was established in 1923 after Frye v. United

States. The Frye test requires that technical evidence must be acquired through a scientifically

proven method that has gained acceptance in that particular field of science.

III. The Experiment

1. Providing an Environment Identical to Current Investigations. The purpose of this

experiment and benchmarking is to provide reference information to benefit the Northeast Cyber

Forensics Center (NCFC) in determining the best method for performing digital forensics. The

results of these experiments will also benefit the forensics community at large by providing

concrete evidence and statistics in relation to performing digital forensics in a virtualized

environment. The main goal, however, will be benefiting the NCFC with its setup. Due to this,

the experiment will be formatted to provide as similar of an environment as possible to what

currently is being used. This will determine the use of hardware, software, and procedures

during the experiment. The NCFC currently uses specialized hardware in their operations called

Forensic Recovery of Evidence Device (FRED). FREDs are high-powered workstations with

abnormally higher system specs specifically designed for performing digital forensic work. The

FRED used in this experiment has the following specifications. The central processing unit is

composed of eight Intel i7 cores. To increase performance through hyper-threading, cores are

virtualized, meaning that for every one physical core present on the system, the operating system

addresses two logical cores (“Intel Hyper-Threading Technology”, N.D.). Hyper-threading

increases performance by allowing the operating system to schedule two threads or processes to

a single core. Therefore, the FRED logically has 16 cores, 8 of that are virtualized. The specific

FRED uses 16 GB of Random Access Memory (RAM), which provides a location for fast data

storage and transfer compared to the much slower physical hard drives. FREDs also contain six

12

hard drive bays, which provide the potential for a large amount of storage locally. Four of the

hard drive bays are considered hot-swaps, which are connected to the system through FireWire.

These bays can have hard drives installed or removed at will even while the system is running.

FireWire is capable of transferring data at a rate of 50 MB per second. The remaining two bays

are connected to the machine through a Serial Advance Technology Attachment (SATA). These

bays are capable of transferring data at a rate of 375 MB per second. It is important to note that

the SATA bays are not hot-swap meaning the device must be installed before the system is

turned on and cannot be removed during operation. The FREDs also house a native write-

blocking bay that can be used to image a hard drive. The current operating system used at the

NCFC is Windows 7 Professional 64 bit, which will be used in this experiment as a host

(Environment one). A second windows 7 operating system will be deployed as a virtual guest on

top of the windows 7 host (Environment two). A third operating system that will be used is the

Ubuntu flavor of linux (Version 12.04 TLS). Linux will be used as a host operating system upon

that a third Windows 7 virtual guest environment will be located (Environment 3). In order to

maintain the integrity of the experiment, no operating system updates or software updates will be

applied to the environments during the testing phase. In order to test the virtualization aspect of

this experiment, the software VirtualBox will be used in creating the environment. VirtualBox is

a free program and is capable of running on Windows or Linux. The forensic tool that will be

tested is Forensic Tool Kit (FTK) 4.0.2 provided by AccessData. FTK is a commonly used

forensics programs with wide acceptance within the community. This tool is also one of the

main programs utilized at the NCFC for forensic investigation. In order to provide the best

environment possible, this experiment will utilize the I/O channel optimization (“Quantifying

Hardware Selection,” N.D.). This setup uses four drive bays to store different components of

13

the investigation. Drive one (SATA) is a 500 GB Seagate 7200rpm drive, which contains the

operating systems, virtual machines, and the forensic tools. Drive two (SATA) is a 1000 GB

Seagate 7200rmp drive, which contains the output database of results discovered during the

investigation. As discussed in Quantifying Hardware Selection, the most effective method of

increasing performance is housing the output in a separate drive with the highest throughput and

RPM. Drive three (FireWire) is a 2000 GB Seagate 7200rmp drive, which contains the evidence

images to be examined or processed. Drive four is a 1000 GB Seagate 7200rpm drive, which

contains the evidence computer. Drive four is imaged with FTK imager, the output of that goes

into drive three. The experiment will be comprised of three different environments and three

different evidence packages. In order to provide typical evidence scenarios that will relate to

current operations at the NCFC, the three evidence packages will consist of a 50, 250, and a 500

GB evidence image. To maintain the integrity of the experiment, one 1000 GB Seagate 7200

RPM hard drive will be used to store these packages. All testing will be completed on each

package before testing the next package. When proceeding to the next testing package,

additional evidence will be added to the current package until the size requirements are meant.

This was decided for multiple reasons. This method will reduce the amount of time required to

setup the evidence package. Deciding to use completely different evidence packages, we would

have to ensure that any data from the previous package were securely erased from the medium.

Otherwise the forensic tools could detect data from a previous package, which would

contaminate the evidence environment. Additionally, processing the same evidence multiple

times provides repeatability and testing certain features such as hashing and identifying

evidence. The evidence medium will also have a Windows 7 operating system installed so that

imaging, processing, indexing, and search times will reflect a real current investigation. The

14

clean install of Windows 7 requires roughly 20 GB of space. Therefore the respective evidence

packages will contain roughly 30, 230, and 480 GB of evidence. The actual data comprising the

evidence consists of typical data types that digital forensic investigations focus on. This includes

images, documents, and videos. The images used are contained within the Dresden Image

Database for Forensics Benchmarking (Gloe, 2010). The initial photos totaled 4949 high

resolution images, which require roughly 15 GB of space. In order to quickly grow the amount

of evidence required for the 250 and 500 GB packages, these photos were ran through image

software, IrfanView, which provides batch processing. By processing these images through

various RBG filters, it is possible to quickly accumulate evidence files with unique hashes.

Multiple video files were acquired through Youtube. To provide variation, some files are short

(5-10 minutes), while others are roughly 80-90 minutes. Numerous document files were made of

varying types using Microsoft Office and notepad. These documents also contain textual

evidence to complete string search and regex search experiments. A second source of documents

was obtained through the Enron e-mail dataset, which has been sanitized and released as public

domain material (Enron Email Dataset, 2009). This research will examine whether or not FTK

will find these evidence items in both the virtual and non-virtualized environment. During the

250 GB and 500 GB runs, some data will be deleted to determine if the virtualized environments

will competently recover deleted files.

2. Designing the experiment. The National Institute of Standards and

Technology (NIST) created a methodology for testing forensic tools called the CFTT Forensic

Tool Testing Methodology. This experiment will be modeled after the guidelines provided in

this document. Step one of the CFTT is to acquire the tool to be tested. The NCFC has licensed

full versions of FTK 4, which is the focus of this research. Step two is to review relevant tool

15

documentation. AccessData provides numerous support documents on their website explaining

how to use FTK. Step three is to choose relevant test cases based on the features of the tool.

FTK provides support for imaging, file carving, key word searching, and regex searching. FTK

is capable of recognizing and recovering most files types. This research will test the following

data types, which are frequently sought after during cases: images, documents, e-mail, and video.

Step four is to create a testing strategy to evaluate FTK and its abilities. The testing workflow

consists of first creating an evidence drive that will contain the evidence files to be tested. This

will consist of a base Windows 7 operating system and the various document types. It is

important to ensure that the evidence packages are prepared and validated before entering them

into the evidence drive. This is due to the fact that it is difficult to permanently remove data

from the drive once it has been entered and typically requires the full drive to be zeroed out.

Once the evidence drive is ready for examination, FTK Imager will be used to create the forensic

image that is readable by FTK. When drive three is being images, the evidence image is piped to

drive 4 the image drive. Once imaging is complete, FTK processing can begin. When creating a

new case, the case folder needs to be piped to drive two, the database drive. This is where the

FTK database software has been installed and should be sufficiently large to contain the

databases for all cases in the experiment. This workflow will be completed twice for all three

environments. The processing will be completed twice for each environment to determine how

much variation exists in each environment and to satisfy minimum requirements for thorough

testing of forensic tools (Robust Correctness Testing for Digital Forensic Tools). Once

processing is complete, FTK creates a result document, which lists the processing time, database

optimization times, and number of files found. Now it is possible to examine the evidence

through the FTK framework. At this stage, it is possible to test keyword, regex, and to gather

16

pertinent experiment data such the number of individual file types found and file hashes. This

data will be pulled out into a spread sheet so that it can be compared to subsequent cases. Once

the processing and data gathering has been completed for the 50 GB image, the subsequent 250

GB and 500 GB scenarios can be completed following this procedure. Step five of the CFTTM

is to analyze results and create reports. At this point, it is possible to analyze all of the data

gathered, create a formalized report, and determine which environment performed the best.

IV. Results

1. Present Results: Speed and Accuracy. The results of the research surpassed

our expectations for the experiment. At the onset of the experiment, we predicted that the

virtualized environments would be as accurate as the non-virtualized environment, but the virtual

environment would be less efficient in terms of speed. The reasoning behind this is that the

virtual environment would have fewer resources allocated. With fewer resources, it would stand

the reason the virtual environment would not be as fast. However, the results were completely

counterintuitive and the initial research showed the virtual environment was able to process the

cases faster. However, the Linux environment, which did perform faster than the non-virtualized

environment, did not produce the same levels of accuracy. The Linux environment continually

failed to discover the same number of files as the two Windows environments. The Linux

environment was tested with the 50 GB and 250 GB evidence packages, after which, we decided

to discontinue testing. As mentioned previously, the Linux environment failed to discover an

acceptable number of files and was deemed not accurate enough for further testing. The overall

results in speed are presented in the following graphs.

17

Figure 1 represents the total overall time each scenario took to complete processing in FTK. The

blue lines represent the basic Windows host environment with no virtualization. The red lines

represent a Windows host running a Windows guest virtual machine. In every sample the

Windows virtual machine outperformed the windows host in total job time.

Figure 1. Total Job Time Per Each Environment and Scenario

50gb 50gb 250gb 250gb 500gb 500gb0:00:00

12:00:00

24:00:00

36:00:00

48:00:00

60:00:00

72:00:00

Windows hostWindows VM

Total Job Time

18

The next graph, figure 2, shows the total processing time for each case. As mentioned

previously, processing is the act of enumerating the digital data and deciding where a file begins,

where a file ends, and what type of file it is. Similar to the total job time, the processing time

was faster on the virtual Windows machine than on the basic Windows host.

Figure 2. Total Processing Time Per Each Environment and Scenario

50gb 50gb 250gb 250gb 500gb 500gb0:00:00

4:48:00

9:36:00

14:24:00

19:12:00

24:00:00

28:48:00

33:36:00

38:24:00

43:12:00

48:00:00

Windows host Processing Time Windows VM Processing Time

Total Processing Time

19

The next graph, Figure 3, displays the total indexing time for each scenario. Indexing is the

process of taking all of the files that were processed in the previously mentioned text and

creating a master file list of what the file is and where the file is in relation to other files. This

portion of the processing allows for the quick search time present in FTK.

Figure 3. Total Indexing Time Per Each Environment and Scenario

50gb 50gb 250gb 250gb 500gb 500gb0:00:00

12:00:00

24:00:00

36:00:00

48:00:00

60:00:00

72:00:00

Windows host Indexing TimeWindows VM Indexing Time

Total Indexing Time

20

The last graph, figure 4, shows the post processing time for each scenario. After processing has completed, FTK will optimize the indexing database. This is done by cleaning the database which includes compacting the information and removing duplicate entries. The database that FTK uses utilizes the structured query language or SQL. It is common for SQL databases to routinely run optimization routines to ensure the database is optimized for efficient use.

Figure 4. Total Post Processing Time Per Each Environment and Scenario

50gb 50gb 250gb 250gb 500gb 500gb0:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

Windows Host Post Processing TimeWindows VM Post Process-ing Time

Total Post Processing Time

21

The accuracy of the virtualized environment compared to the non-virtualized environment was

exactly the same. Not only did the virtualized Windows environment find the same number of

files, this environment was also capable of correctly hashing all of the files of interest. Due to

the large nature of the files used to check the accuracy, it is not feasible to include a visual

representation of them in this document. However, the file used to check accuracy can be found

online

V. Conclusion

Contrary to our initial assumptions, running FTK in a virtual machine was faster than running the

software in a non-virtualized environment. As stated at the beginning of this document, this

research is only the beginning of optimizing virtual environments for the use in digital forensics.

At this time, we are unable to determine why the virtual environment was faster than the non-

virtualized environment. It is a possibility that having a virtual environment on the FRED

somehow slowed down the non-virtual environment and that FTK would run faster on a FRED

without any virtualization software present. From this preliminary trial, it does appear that the

virtual environment was the fastest and just as accurate as the non-virtual environment. This

certainly lays the tracks for future research into this topic as there are many variables that could

be tweaked. For further research, it would be prudent to determine what caused the discrepancy

in the number of files found in the Linux environment. Considering that we only tested one

Linux OS and that there are dozens of other Linux flavors, further research into the use of linux

would be prudent. Another area of research would be to continue optimizing the Windows

environment. This would include tweaking the memory allocation, turning off unnecessary

services, and testing scenarios with a wider variety of evidence files.

22

References

CFTT Methodology Overview. (2012, January 12). NIST Computer Forensic Tool Testing

Program. Retrieved October 22, 2013, from

http://www.cftt.nist.gov/Methodology_Overview.htm

Curtis, G. E. (2012). The law of cybercrimes and their investigations. Boca Raton: CRC Press.

Digital Forensic Services in the US Market Research. (n.d.). Market Research Reports. Retrieved

October 22, 2013, from http://www.ibisworld.com/industry/digital-forensic-services.html

Cohen, William. (2009, August 21). Enron Email Dataset. Retrieved October 22, 2013,

from https://www.cs.cmu.edu/~enron/

FBI, Brief History of the FBI. (n.d.). FBI.gov. Retrieved October 22, 2013, from

http://www.fbi.gov/about-us/history/brief-history

Giordano, J. (2002). Cyber Forensics: A Military Operations Perspective.International Journal

of Digital Evidence, 1(2). Retrieved October 22, 2013, from

http://www.utica.edu/academic/institutes/ecii/publications/articles/A04843F3-99E5-

632B-FF420389C0633B1B.pdf

Gloe, T., & Böhme, R. (2010). The ‘Dresden Image Database’ for benchmarking digital image

forensics. In Proceedings of the 25th Symposium on Applied Computing (ACM

SAC 2010) (Vol. 2, pp. 1585–1591).

23

Intel Hyper-Threading Technology. (N.D.) Retrieved 11/14/2013 from

http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-

threading/hyper-threading-technology.html

Loth, S., Baumann, S., Lutz, C., & Heinrich, A. (2012). Bistability in Atomic-Scale

Antiferromagnets. Science, 335(6065). Retrieved October 22, 2013, from

http://www.sciencemag.org/content/335/6065/196.abstract

Merritt, R. (2013, August 27). Moore's Law Dead by 2022, Expert Says | EE Times. EE Times |

Electronic Engineering Times | Connecting the Global Electronics Community. Retrieved

October 22, 2013, from http://www.eetimes.com/document.asp?doc_id=1319330

Quantifying Hardware Selection in an FTK 4.0 Environment. (n.d.). DigitalIntelligence.com.

Retrieved October 22, 2013, from

www.digitalintelligence.com/files/FTK4_Recommendation.pdf

Pan, Lei. Batten, Lynn. (). Robust Correctness Testing for Digital Forensics Tools.

Roussev, V., & Richard, G. (n.d.). Breaking the Performance Wall: The Case for Distributed

Digital Forensics.dfrws.org. Retrieved October 22, 2013, from

dfrws.org/2004/day2/Golden-Perfromance.pdf

Sperling, E. (2010, October 4). The Limits Of Virtualization. Forbes. Retrieved October 22,

2013, from http://www.forbes.com/2010/10/01/enterprise-computing-challenges-

technology-cio-network-virtualization.html

24

THE ECONOMIC IMPACT OF CYBERCRIME AND CYBER ESPIONAGE. (2013, July).

McAfee. Retrieved October 22, 2013, from www.mcafee.com/us/resources/reports/rp-

economic-impact-cybercrime.pdf

Trends & Data - Internet Retailer. (n.d.). Industry Strategies for Online Merchants – Internet

Retailer. Retrieved October 22, 2013, from

http://www.internetretailer.com/trends/sales/

Webb, Kent., (2003) A Rule Based Forecast of Hard Disk Drive Costs. Faculty Publications.

Paper 9. Retrieved October 22, 2013 from

http://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1008&context=mis_pub

Worth, D. (2012, December 5). HP Discover: US Navy facing 110,000 cyber attacks every hour.

v3.co.uk. Retrieved October 22, 2013, from

www.v3.co.uk/v3-uk/news/2229651/hp- discover-us-navy-facing-110-000-cyber-attacks-

every-hour

nate french internship final paper

Documents